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CHAPTER ONE 

v 

INTRODUCTION 



, Language variation limits communication. For this reason, language variation 
is a vital, concern to educators, government officials, broadcasters, publishers, 
writers, missionaries — -to anyone who has a message to communicate. Many , of the 
developing nations of the world face the challenge of trying to communicate with a 
multilingual population, a population which may include well over a huridred dialects 
.'or languages. Even among nations where a single national language is firmly 
established, gross dialect variations of the national language and pockets of 
minority languages still exist . 

It may not be thought feasible for a country to initiate projects such as mass 
communication, bilingual education, or vernacular literature production in every one 
of its languages and dialects. On the other hand, if that country wishes to reach 
all of its citizens, it must carry out its programs iA languages that are both 
understood and accepted by all groups concerned. The urgent need then, is for a way 
to determine which specific dialect or dialects are the most useful in reaching 4 
given population. This .thesis develops strategies for understanding how language 
variation limits communication and for devising solutions which will help overcome 
these limits to communication. A. 

1.1 Ah overview 

Chapter 2 deals with gathering the fundamental data for a study of language 
variation* and limits'to communication. It addresses ,the question dt how to measure 
communication. It can be measured by devising tests which allow the investigator to 
observe how well one group understands the speech of another. First I describe in 
some detail a method of testing understanding which I used in field studies in the 
Solomon Islands. Then I briefly review a number of methods which other 
investigators . have. u,sed. FinaUy I propose a taxonomy, of intelligibility testing 
methods. My conclusion is that no one method of testing intelligibility is 
inherently better than another; rather the choice of a method depends on 'the 
particular situation. The resulting discussion should serve \ as a guide to the 
prospective field investigator for helping select a \methcd of measuring 
communication whictTls best suited to his goals and the capabilities of the /people 
among whom he will do the testing. ; 1 

* Fortunately, communicating with every citizen in a. particular region does not 
usually require that 'a vernacular language program be initiated In each one of its 
dialects. Chapter 3 tells how the data gathered by the methods pf. Chapter 2 can be 
analysed to determine how many vernacular language programs are heeded , in an area 
and where those programs should be centered. \ A raaj'br deterrent to vernaoular 
language programs is. the high cost of setting them up and keeping them going. . The 
techniques presented in Chapter 3 find' the least costly solutions to establishing 
vernacular language programs in an area by finding grouping* of the dialects which' 
minimize the number of language programs required whiles at ythe same time 
guaranteeing that all oitizens will adequately understand the language of at least 



one of^the programs. . - ~J 

Chapters 4, 5, and 6 form a>unit on the tropic of explaining communication. The 
methods for measuring communication discussed in Chapter 2 tell us only whether . or 
not communication can take place and to what extent. The methods for finding 
centers of communication in efiapter 3^ allow us to take advantage of measured 
patterns of communication in finding the least costly solutions to communicating 
with' all the citizens of a region. However , -neither method explains why there is 
communication at all oV why the patterns of communication should be w£at they are. 
By understanding why patterns of communication are what they are, and not Just, what 
they are, it is possible to make better proposals about language planning in an 
area. Furthermore, b-y understanding the factors whioh" Contribute to, intelligibility, 
in ' aa area, it is possible to estimate intelligibility relations which it is not 
feasible to measure. 

The approach to explaining communication is one of ' building models . Chapter 4 
concentrates "on the subject of modeling itself-. After a discussion' of the meaning 
and advantages of modeling, a basic model for explaining communication is Droposed. 
The model suggests that the amount of understanding between dialects depentls on two^ 
factors: the ;inguistic similarity between dialects and the social relationships 
between them. . N 

'In Chapter 5 the « factor of linguistic similarity is considered in detail. 
After a general discussion of various aspects of linguistic similarity and how they 
can be measured ,. data from ten different field studies are analyzed to explore the 
relationship between lexical similarity and intelligibility. As a conclusion, a 
general model for expressing' this relationship is proposedx 

* * In Chapter 6 the second factor "of the model, sociarffelations , is considered in, 
detail. First the role of social relations in explaining communication and ways of 
measuring' social relations are discussed. The^ data from the island of Santa Cruz, 
Solomon Islands, are considered.. A more comprehensive model which embraces social 
relationships as well .as linguistic ones is used to explain communication between 
dialects on the" island. 'The predictions derived from this model at^e over 90* 
accurate . 

1.2 -.Some definition/: intelligibility and dialect * s 

Before proceeding with the text, two terms need to be defined: intelligibility 
and dialect. The problem is not so much that people do not know whaE>£hey mean, but 
that 'they mean different things to different people. Therefore, - 1 now define them 
, in the way that they will be used^JthToughout the* thesis. 

Iptell-lgityilltv is synonymous with understanding and comprehension.. (The root 
Word is intelligible , not int^tTgence . ) Dialect intelligibility refers 

specifically- to the degree to which speakers of one dialect understand the speech of 
another dialect': Som$ linguists who" have studied dialect intelligibility restrict 
the term to mean only a theoretical expected degree of Understanding of individuals 
who have had no experience with the other dialedt.- For instance, Gillian Sankoff 
defines intelligibility in this way ( 1 969 :-839-84t)) . If understanding is boosted by 
expe/ience with, the 'other dialect, then she contrasts that with intelligibility by 
calling it "bilingualism". She. uses the term "incipient bilingualism"^ to refer to a 
degree of bilingualism which does not imply a great deal of learning. 




' dial.it T« n 5 i t f?^ 11 ; 1 * 11 "* ^ Chi» way. If a person '.understands anotfiif 
dialect then that dialect is Intelligible to hira. Bllingualism and- incipient 

•*? i ?f^??. 0t COntrast wit ^ intelligibility; they are special cases of 
intelligibility. Whenever I refer to that special case of intelligibility which is 
the theoretical degree of understanding between dialects whose speakers have had no 
contact, I use the-Wm Inherent Intelligib ility . 

• Another common use of the term in the literature is in the phrase ffluJaiai. 
rS iffi? * 1 - t,Y ' ThlS pnra3e wa3 coined in the .early studies of Intelligibility in 
the fifties Section 2.2.1). Those investigators were actually trying to measure 
inherent intelligibilty and they av.eraged the intelligibility in both directions- 
- between a pair of directs in order to approximate a measure of linguistic 

1 S£?- rlt ? ? h ^V they thought should be symmetric. This relationship they termed 
mutual,- intelligibility. Somehow the phrase "mutual intelligibility" . became 
interchangeable with the term "intelligibility" In t"he general literature. They are 
A°s 1 ho " ev « r ' "^teUlgibility is not usually a two-way phenomenon. 

As Intelligibility of B s speech is a different thing than B's intelligibility of 
As speech. Intelligibility is mutual if and only if the degree of understanding is 
the same in. both directions. It sometimes is, but ' asymmetric linguistic and social 
relations often .make, it otherwise. Mutual intelligibility is not synonymous with 
intelligibility; it is another special case of intelligibility. < 

The second term. that needs defining is dialflcjL. Two popular level notions of 
dialect are that it refers to a funny way, of speaking or to. a way of speaking that' 
differs from a standard -or prestigious language. But ^rt ajLinguistic v iew the term 
carries no such connotations; it refers simply ' to a variety of speech. Some 
linguists have attempted to define dialect precisely so as to assign it an" exa'cft* 
place within a £ierarad,hy of ling^stic taxonomy. All such definitions end up beiM> 
arbitrary, however, and none has received' widespread acceptance. The only 
satisfactory- definitions seem to be. loose ones., Charles -Hockett gives a' good 
example p 958* 322): * v 



A language ... is a collection- of more -or less similar idiolects. A 
dialect is just the same thing, with this difference: when both .terms are 
used in a single' discussion , the degree of similarity of the idiolects in 
a siogle dialect is presumed to be greater than that of all the idiolects 
in the language. . 



Throughout this thesis, when I use the term dialect, I shall be referring to a 
c&Ilectlon of similar Idiolects. I use the term dialflSjL gr^, or sometimes just 
dialect for. short, to refer to the group of people who speak those Idiolects. 

The kinds of dialects which I investigate^ in this thesis, and which' other 
investigators whom I cite.' have, investigated, are regional or community dialects, 
that ,is, the variety of . speech which is cqmmon to thtf individuals in a region or a 
local community like a town or village. Social dialects which cut .through regions 
or communities have yet to .be investigated using dialect intelligibility" 
methodologies. Therefore the local community actually serves as the ^minimal unit in 
defining the dialects considered in this thesis. That is, dialec'tr'refers to the 
variety of speech common to a local community or a" jnore inclusive grouping of 
communities. Two dialects are distinguished If their respective^ speakers recognize 
that the varieties of speech are different'. The degree of difference is not- at 
Issue in distinguishing dia;ects, only the fact that there is a difference. ' 
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• " CHAPTER 2 
MEASURING COMMUNICATION 



X ? a Popular notion that people who understand each others' speech sneak tha 

™c*U *i£tAfp Pe ri ° Pi % wh0 oannof Understand each other speak diffESS la^uaget 

lb ' S ! U T re 1969:2 °3' ivio . 197^:69^ Thus it is* tSa? 

or not ^ iiS.r^ }ft im P ortan <* aa a mathod for determining whether 

' 95? wl 1„h , ! aP^ch o 0mmunitle ^ uaa ^he .swnd language' (Voegel in and Harris 

a llkl Hn h f J 962) * Sln ° e ^•"UiWity testing was first described in 
n«b.r ?™ L C */* Voe « elin «n« Zellig Harris the method has been refined by a 
SSjr^?f T i nVM "?« t °«- Thus far it has reached its fullest development In EugenS 
Casad's 197U manual, Maleot Int a iltaihi) 1tlY Tfinttlqe: pmen^.m tugene 

In this ohapter matiy different methods of testing intelligibility «n« 

•SSiSiuiti ^ 8ln *S Se H tl0n 2 * 1 WUh adatailad disbussfon'of^w 8 o'conSuct an 
intelligibility survey based on the method of intelligibility teatina-t .i«*<i ?« \*1 

share? c : landS \ ThiS me ^ 0d 18 at?Pli0able in '^ionl w^f?* s iga o'r 

shares a common language with illiterate test subjects. . Where subleets ara 

me° h SirS 1 tminl^iViKf;^" mCth ? d? a ™ iata ' ?n e s r e e ct!o U n J 2 e 2 t8 otne r r 6 
sVvev rL«?ni fn 8 J Uy aPe r * viawed - The" basic outline of conducting a 

- tests differ 4 7 detaila about< constructing .and administering the 

m ► * t 9 a ooncruaion » Section 2.3 develops a taxonomy of' intelligibility testing 
U \ ?r a L«T* ^aluates the situations in which each method is mostlppro^ate ?? 
ia- argued that no method Is inherently betted than another; rather, the equation 

*To«l ^hich 6 9 il U T\i!l WhiCh the teStln * 18 dona « An^ptim.1 method is de?ineS 
e?foT £ £ ah«!i ?h^ h H^f at ! 8t am ° Unt of , in '°™ation w^th the least amount of 

?he analysis 1 n fr T^*' ° ptimal ih di ^ent situations. 

J"* 1 / 313 m Seotion 2.3 should serve as a guide to field investigator* f«v» 
selecting a method of intelligibility Resting. , investigators for 

2.1 Conducting an intelligibility survey ■ * - 

' ' Intelligibility between dialects i's measured by observing how well soeakara of 

one dialect understand a recorded tex't from another di.^c^ .?o oar'^o^ th!l 

Se teitr q inrt h r:i ,? ux : ct araa ba vi8uad «» «^ io cd s 0 

o? ToSr a !2n^ M? n ^ Um ! t0 ?°- th6 t63tlng * intelligibiity survey consists 
of four steps: (1) planning the survey, (2) collecting *he texts, (3) preoarina 
test tapes, and (4) administering the tests. . The hnal step of procisSinS JSd 

£J5Sil2iiS i IT"* in n5llOWin8 Chaptara ' Th?s basirou?l 8 ne 8 of "n 

-iSiliiJ 1 5 i 1 ?? S Vey h0lda ^ or ^ tna ■•thod. described in this chanter Tha 
■Lttod T * of ° ol J* otin «' Preparing, and administering tests* , III 

of J dia ect S ^te?l ;ib??i?r l8Und8 ( f iO0n8 1977a) * For otW o6m P la ^ ^erviews 
oi a dialect intelligibility survey see Linda Simons 1977 and ohapter 2 qf C asad 



2.1.1 Planning the survey 

TJie purpose of Ihe- planning stage is to determine which villages must ^ 
visited during the intAngifcUUy testing survey/ It is i^ially not -necessary to 
conduct a test in every village within the survey area. Rather, we need only to 
test one representative village for each different dialect in the area. Therefore, 
it is wise to use any maps, census data, df^inguistic and anthropological 
publications about the area to< determine the location and extent of each of the 
different dial^t groups within the area. Often there will be very little such 
material available and the -investigator may have to rely ^almost entirely upon his 
first visit into the area to gather this in-formation.' In this case the information 
is gathered by talking to local people to gain their opinions about the dialect 
groupings within the area, through this questioning the .investigator gains a rough 
sketch of the dialect situation within the area. This preliminary picture is bound 
to be. incomplete . The investigator must be sure to maintain a flexibility to follow 
'new leads as they are uncovered at later stages in the survey. 

After all the presumed dialects have been located the investigator can plan a 
route for the survey trip through the area. Ideally^ he should plan to visit one 
village for each of the dialect groups turned up in this preliminary stage of the 
survey. The actual villages which are visited may be determined on the basis, of the 
presence of roads or trails or nearness to other villages which must be visited. 
Local opinions about which villages are important ones should also be considered. 
It is generally wise Co visit the most remote village' in the survey area last. If 
it is visited last tkhere is no need to return again. The test tapes for all the 
other dialects -will have been collected and prepared by that time and the 
•administering of the intelligibility tests can be^gin at that village . 

1 Another important aspect of the planning stage is a pilot survey in which the 
methods of collecting, preparing, and administering th% tests are tried' out irn one 
'or two villages brfore the actual collection phase begins. This trial run may point 
to modifications needed in the method before it is too late to change. 



2.1.2 Collecting the texts * 

'\f •' . < * 

V- Oh the first trip through the survey area the investigator stops at each of the 
villages selected in the planning stage' in order to collect texts which are to be 
used in Intelligibility testing. If a more extensive language survey -Is being 
conducted, one which also includes study of linguistic similarity and social 
relations between dialects , < these data should also be collected during this first 
'trip through the area.' This allows the^ investigator to have a good look over all 
the data before making the second trip. During the second trip he will then be more 
aware ofthe^whole setting and will have opportunity to ask further questions about - 
social relations or to c^eck up on linguistic data that may look questionable. 

The informants chosen to give' the texts for the intelligibility tests .should be 
native speakers of the local dialect and also speakers of a language shared by tWe 
investigator or his assistant. The .investigator should first carefully screen the 
informant to be sure he or she is adequate. This is done by asking' the informant 
where he "was born, where his parents were born, if he has lived or worked" in any 
other ' areas, the languages his parents and spouse speak, and other questions which 
will help to determine if the informant is truly a native speaker of the looal 
dialect. Special care must be taken in areas where men or women marry into villages 
other, than' their own — half the adults- in a village may not be native to it. 
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kind nf J mport f n ^ the - investigator clearly exp^in tp his in/rmant what 

a Efi? ^ fh T ted * ™ 6 t6Xt 3h0uld be falrlv ^ortAn ideal leng*£ is two and 
a half to three minutes, though texts as short as one arid a. half minutes -or as long 

autobJo^nM^? 9 . hSV ? be ° n US6d succe 3sfully. The Subject matter should be 
autobiographical in nature, rather than folkloristic or '^procedwral. -Folkloristic 

Ts a ZZlT a l tGXt3 H° ften ° 0ntain a ap « cia1 ^ ^le v/cabulary. Also the^e 
fl p 8 6 widespread knowledge of both i^klore and procedures which make them 
unacceptable subject master for intellifobi^ty testing, Secause only minimal cues 

text !mpE niT t t0 ^ a11 the ^Vccessible. /Thus an autobiographical 

text which will be unpredictable to the listen* in its co/tent is most desirable. 

noastbV 3 ^ 6 !^ 1 f0P ^ inve3ti « ator to »W3t topics/to the informant. Some 
^1 topica aro: what he did yesterday, a favorite Lnting' or fishing sWy, a 

- T Cy ' ° r 8 r6Cent trlp *< If the investigator/alread; has collated a few 
good texts from other .villages. which this informant mty understand, ft may be 
helpful to play these^for him so he may g^et an idea^of jfhat is expected him 

The informant may appreciate a practice run tfo tell his story be'fore It is 

lilt \ a *it: ^ may f help t0 put him at « ase . allow him/to organize his thoughts, and 
also give the investigator an idea whether or not the /story i a appropriate The 
investigator can then ask questions about the content and help the informant bring 
out details in the episode which may improve the quality of the test. If a text is 
recorded and then proves to be too short, the same/ kind of technique can be used. 
The. investigator can ask questions, about what /has been recorded and offer 
suggestions as to how the text could,be expanded. T^en the informant can be given a 
chance to add more to the end of what has already bejbn recorded 

, hA f fter a 8°? d text has been recorded it must be /translated into a language which 
the investigator can understand. This would ordinarily be a trade language dr the 
national language if he is not familiar with the vernaculars in tjhe area This is 

^95ll28 ne L i SfSS-.' 1 ?S??oK? I, ^S hi 2? U f lng tW ° ta E* ™ c °r da " <Voe*elin ^and Harris 
i95K32«, L. Simons 1977:240). The first tape recorder is used to play back the 

original text in short sections. These sections shbuld correspond .to natural breaks 

in the text. After each section, the storyteller is asked to give a translation of 

that section. The second tape recorder is left running during this whole process in 

.order to record- both the original text and its tr^ar slation. The result is like an 

interlinear translation of the original text. The completeness and accuracy of- the 

translation can be verified by getting another translation of the story f^rom someone 

else or by administering the completed test tape to other speakers of that .dialect*. 

• i 

2.1.3 Preparing t.he test tapes . » «.* 

^n a nJ h !/ i^ ^ t SteP ln P M eparin8 a test tape is t0 transcribe the ' interlinear 
translation tape. Unless the vernacular texts <are also needed for grammatical 

analysis or comparison, there Is no need to make an exact morpheme by morpheme 
transcription and, translation of the text, the vernacular portion of the Uxt may 
.be transcribed in broad outline only, noting mainly the intonation contours and the 
final syllables preceding pauses. The translation, however, should be transcribed 
in rull. The complete translation is then, studied to break up the text into* -logical 
segments. When possibley these segments should be defined both in terms of their 

- content and of having f inal. intonation contours. They should be long enough so that 
questions can be asked about the content 6f the segment , bu* not so long thai a 
listener would be likely to forget what tPok place at the beginning of a segment 
Defore he reached the end. Around fifteen seconds is an optimal length for a 
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segment ^ * 

The actual test tape consists of two parts. In the first* part, the first one 
otf two minutes of the text are copied without a break. In the second part tne - 
entire text is,copied In the short segments defined above. The purpose of the first . 
part of the test tape is simply to allow the listeners to tune in to the speaker's 
voioe and to the new dialect which they are about to he tested on. The!.' second part 
of the test tape whicfc io divided into seotions comprises the actual test. In the 
testing situation this form of the text is played back segment by segment, and after 
each segment listeners are asked to make" a response*. 

, The* test tape is made using two tape recorders. In one, the original 
vernacular text isplaoed; in the other, a blank tape which will be the test tape, is 
placed. To, record the first part of the test, the 'uninterrupted section of text, 
the transcription should be studied to find a logical breaking point which is one to 
two minutes into the text. If the text is short, this first part qf the test ma} 
include the" whole text. If the text is long, it will save time in the testing 1 to 
cut the text short for the first part. The blank tape is theti set to reoprd while 
the original text is played and thisjf irst section is dubbed onto the blank tape. 
At the' selected breaking point, both tapes are stopped and" the original tape is 
rewound. The second tape is allowed to move forward about ten seconds. in order to 
make a blank space -between the first and second parts of the test. Next the 
original text is dubbed onto the test tap^ segment by segment. The segments should 
already be marked off in the transcription of the text. As the investigator makes 
/this test tape he follows the broad transcription of the vernacular text to be able 
to determine -where each segment ends. At the end of each -segment , the original tape 
is put on "pause" while the test tape is allowed to keep running in order t^i^nsert 
a blank space of abtfut five seconds between segments. This process is cqntinued 
until the whole text is copied onto the test tape, segment by segment. 

.#2. 1.4 Administering the tests , ■** 

The fi'rst step in administering the tests is deciding which test tapes should 
be played in each of the villages visited on the second round of the survey. If the 
survey area includes morse than half a dozen different dialects it becomes impossible 
to administer every test tape, in every ' village. In general, one should no,t 
administer more than five tapes to any one individual or group, due to fatigue of 
lj4 both the subjects and ^tee investigator. The investigator, therefore, must guess 
which tests will give' the most * information at any given village. If it is 
/absolutely necessary that a large number of tapes be tested in one village, it can 
be done by playing one Set of tapes to some subjects and another set to others! 

L \ 

To determine which dialects to test at a given village, the investigator must 
rely on the data which have ^already been collected from the area either . in Vhe 
planning, stage or in the collecting trip. If, according to information already 
available, it is already "apparent that the similarity between two dialects is very 
. high, then in general there is no need to test their intelligibility. 'By the same 
token, if .similarity is known to be extremely low, there generally will not be a 
need to test intelligibility. Also one oan rely on opinions that have been 
collected during the first round — the opinions of people in the villages as to 
what languages they oan or cannot understand. The purpose of the intelligibility 
testing at this point is to fill in the gaps in the information, to concentrate on 
cases where the investigator is not sure from Other evidence whether he can expect 
understanding or hot. ( * • 

• '< 1 6 " ,. 
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The investigator may also be guided ifl his'ohoice of which test tapes to 
administer by the oharacterist ios of the dialeo/ or village from wbioh the ttfpes 
have oome. Where .the goal .of the survey. is- to determine -centers of' oommunloation 
for use in literature programs^ then the investigator may want .to conoentrate 
testing efforts on the villages or dialeots whionVralght best serve as oenters. This 
noUon of centrallty is based not only on linguistic and intelligibility relations 
but also on geography, accessibility, population, economy*, politics, and the 
facilities (suoh as* stores, sohools > ohurohea, clinios, airstrips) that are 
available in a place (see Sections 6.1.*, 6.1.4, 6.3-2; also J. Sander's 1977). 

) / 

The intelligibility tests oan ' be administered to groups of people or to 
individuals. Group "testing can be used when the Investigator oan assume a 
homogeneity aoross the population as to multilingual experience: a sampling of 
individuals is tested' when he oannot. The assumption of homogeneity or 
heterogeneity oan be based on results at other villages in the survey and on the 
opinions of local people. '(The topio of .group • testing versus individual testing is 
disouaaed in more detail in 'Sect ion \ 2.3.2. ) 'When individuals* are tested, they 
should be isolated (which ,can be done .with, earphones) so that other potential 
subjects will not be disqualified by hearing the test and the answers. Tfce 
investigator should screen the subjects to ensure that they are native speakers of 
the dialect, as was done -for the storytellers tSeotion 2. 1 .2) . The screening 
questions Will also reveal if a subject has had a degree of contact with some other 
dialects which is beyond the ordinary. 

• ft 

When a whole group is tested at once, it is rather awkward to go around the 
whole^group and screen the subjects first. In this case the soreening can be done 
.as the testing progresses. In group testing, a spokesman for the group will 
generally emerge. When questions or translations are asked of the group, the group 
.is free to discuss and come up with an answer which the spokesman will pass on to 
the investigator.- if it becomes clear that the spokesman or another individual is 
'dominating a particular test, screening questions should be asked to determine if 
that person has had close contact with the village -being tested for . If so, 
different individuals from the group should be asked directly for their responses to 
remaining segments" in the test.' This allows the investigator to get a sample of the 
understanding of the whole group. y . . 

The first tape played to ahy group Is the test tape made of their own dialect, 
which is called the hometown test. This test gives the listeners the practioe of" 
taking the test without, the added obstacle 'of dialect differences to overcome. 
During this hometown test, not only do the" listeners have the chance W practice the 
test format, but also the investigator has the chance to evaluate the subjects Is to 
their suitability for testing. It is during this hometown test that the 
investigator may discover deficiences in the abilities of the group or an individual 
subject in translating into the common, language. 'Thus the hometown test acta not 
only as a practioe test for new subjects, but also for a control on their bilingual 
abilities in the common language. 

When administering a test, the first part of It, th\ one or two minutes of 
continuous text, is. played without interruption. Here the listeners are given Che 
opportunity of hearing/the new dialect. The investigator may choose to withhold the 
identity of the dialect and see if the listeners can identify it after hearing this* 
first ^section,' When this f ii>st part of the test comes to an end, the investigator 
stops the tape a«d explains that now the entire story will be .played from the 
beginning one segment at a time. At the end of each segment the investigator stops 
the tape during the pause and the individual subject or someone from the group is 



asked to translate that much of the s^ory/, It\\q ttfe comtoon ^ language . If he hesitates 
the investigator may ask leading Questions to 4 get him started. If an important 
point, has been* omitted from tj}^ translation the^ investigator may as^ specific 
questions to find out if E^eftoij/t was actually not understood or if it was Just 
overlooked in the subject's .tr&nsl,abiot). If none of ttje subjects can* translate, or 
answer any questions, /Aak'if tpjry understood anything, if there were &ny words or. 



phrases they recognized. 
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w % The responses to t each individual segmeht of the text should be. recorded in a 
booklet, A convenient yay to do this is 'to estimate the fraction of the segment 
whioh*was understood, -that is, /record a one if all of it was. understood, a ■ zero if 
none of it *vbs . und^stood, /one-half if half wa? understood,, and 'so on. If only a 
word or a phrase was ^nds^t^d . that word or phrase may be written down. At the 
end -of the test the) responses are reviewed and trie listeners 1 understanding of the 
test tape is summarized as kieing one of the four levels of intelligibility in the 
scale below. If a grpup is tested-, then the understanding of the population 
(assumed to be h^mogeneousr') - is summarized as being of a single, level. However, when 
it is found that an individual is dominating the answers', and then a sampling of the 
group is obtained to counteract . it may be reported that a few with extra experience 
understand at one level /while the majority understand at another. If individuals 
are tested, then the understanding of the population is reported as the distribution 
of the levels of/ understanding among individuals. The four levels of 
intelligibility are asf follows; 



3 = full intelLigibility - The listeners understood everything. At most Wiey 
missed a few details of the story. In 3ome cases a group may have difficulty 
responding to- the first few sections, but after that they adjust to the new dialect 
and translate all/remaining segments fully and correctly. This should be scored as 
full intelligibiLl/ty, , „ , 



; 2 a partial 
story .but mis 
incomplete und 
enough, though 
f ill in the m 



intelligibility - The listeners understood the main points of the, 
/ed many details. This level of understanding is characterized by 
standing of segments throughout xthe story. ' The listeners understood 
that they would jieed only to 'ask a feW v que3tiona of the speaker to 
sing details. This is a level of potential full intelligibility . 



t = sporadic recognition - The listeners understood only isolated words and, 
phrases, perhaps even occasional sentences, Hpwever, -they did not know what was 
happening ir/- the story. . 



0 = nd understanding 
recognized/ a s common word like f 
-'betel nu^'; however, there 
phrases,,/ 



The 

man f 

was 



1 isteners understood nothing. Perhaps they 
or 'house 1 , or an important cultural ^tem like 
no consistent recognition of isolated words or 



Note that only the relative ordering between the levels is defined, not the 
relative distance between them. Thus, level 3 represents more understanding than 
level 72, and 2 more than 1. However, the distance between 1 and 2 is probably 
greater than that between^ and 3. * - 

,5 A summary of time requirements for the method 

The^ ^method requires twoe hours for the preparation of each test tape. When 
/teats are administered to a group, it takes only one hour to conduct tjie teqj'ts in. a 
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particular village . Thus the method requires three hours per villain the survey. 
This compares favorably to the amount of time needed" for ^ conventional 
lexioostatistio survey. HoWever, if tests are administered ,to a sample of 
individuals, then the testing phase will take considerably longer as i* detailed 

- 4 • « 

The preparation of the test tapes for any d&lect consists of" the follpwing 
tour steps: (1) elicit the text, (2) roughly transcribe the text, (3) decide how 
to divide the text into segments, (4) prepare the test tape. The elicltation of the 

«!^ a ?! nC J!!i 1 ! r take ? ° n * nour witn an informant. This includes the time required to 
explain what is wanted, play it back, for the Informant, get an interlinear 
translation of the text ,' and^also play that back. * 

,,. - . _ . • , • 

The remaining three steps generally require another hour. The following time 
figures are based on records kept on -the preparation of eleven test tapes for the. 
dialect survey of Santa Cruz Island (Simons J977a). Transcribing one minute of text 
- ^ took from five and a half to eight minutes, with. an average of six and three-quarter* 

minutes. This time includes making^the rough transcription of tha vernacular texts 
from the interlinear translation tape " and then an exact transcription of the ' 
translation for. each portion. Thus, to trantscribe the ideal text\of three minutes' 
length took an average o.f 20 minutes. After the transcription was finished, it took 
about ten minutes to read it over and decide where to make the breaks between 
segments and what leading questions could be used to prompt subjects when their 
response was not immediate. It took another 10 minutes to set up ' the two tape 
.recorders and dub the test tape. F'inally it took about 15 minutes to type up the 
transcription of the translation of the text with gaps in that transcription ■ 
corresponding to the breaks in the test tape, and with leading questions typed into 
the gaps. This is a total of 55 minutes. An advantage of the method is that all of^ 
this test preparation is done without the .aid of informants. Therefore, it need not 
be done at the test site but can be done at another place where the investigator may 
have set up a camp«~ ' 

" t + 

When the tests are administered during the second trip to the dialects, the 
■ hometown test and the four or five other test .tapes can be administered to a group 
in one hour, if tests are administered to individuals, the process will go faster 
without group discussion time. About 45 minutes are required for an individual 
subject. That comes to three hours for four subjects, six hours for eight subjeots, 
or seven and a half hours for ten subjects. To dp a thorough Job of testing" 
.intelligibility over a complete cross section, of the population n*ay require $0 to 46 
subjects, • Typically, testing in such depth would be done in only one or two 
villages out' of the entire survey area in order to get a feel for the homogeneity' or 
heterogeneity 0^ multUingual abilities within the village populations. The i« 
depth studies would point out the factors, if any, which explain differences in 
f *> understanding (for example, sex, age, or schooling) and would give a basis €or 
interpreting results in the rest of the suvey where only , a small number of 
individuals were tested. . ■ ~ 

« ' * » 

When tests are administered to a group, this method requires three hours' work 
by an. individual investigator for each dialect. This is- not much more time 
consuming than a conventional lexioostatistie survey. The essential difference is 
that the intelligibility survey requires that eaoh village be -visited twicef first 
to coHect the test tapes, and second to administer them, 'whereas the lexicostatisic 
survey requires only one visit. However, a two. pass lexioostatistio survey can give 
much more reliable results than a one p/ass survey. This is because the investigator 
has the opportunity to compare the word lists after ttoey are all collected .and then 
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In a Second visit to re-elicit items which appear to haye been elicited incorrectly. 
Therefore- this intelligibility approach fits very Cicely with a lexicostatistio 
approach for the initial linguistic survey in an area.. Actually, once comparative 
Word lists* are collected, analysis of pJhonostat istics , phonological . correspondences 
between dialects, and lexical and t>hbn61ogical isoglosses can be made without 
collecting any additional data. When a computer is avaiXable, thig added wealth of 
information is almost free in terms t of the investigator's time (Simons 1977b 

describes a se£ off computer programs whiah can*be used). 

r /■', ^ 

Provided .there is not more than three hours 1 travel time between test points it 
woyld be possible ~for a single investigator to conduct the study of linguistic 
comparison and intelligibility <3Nr two test points in a single day. With a two man 
team the work becomes even easier, with one member concentrating on the linguistic 
side of the study and the other concentrating on the intelligibility side. 

• t 
2*2 A review' of intelligibility testing methods 

This review of ihtelligibility testing methods is made in chronological order. 
First, in Section 2.2.1, the early studies in the 1950's are considered. Then Hans 
bolff's 1959 critique of these early studies is reviewed in Section 2.2.2. This 
critique' led to refinements in the method by a group pf investigators from the 
Summer Institute of Linguistics i* Mexico. Section 2.2.3 treats their method, 
finally,. Section 2.2.-4 presents other recent methods. 



2.2.1 The early studies 

The method of intelligibility testing has its origins in a 1951 article by Carl 
F. Vpegelin and Zelli« Harris. They proposed intelligibility testing as a means of 
measuring dialeG*=-44flferences , in hopes rthat it could help define the border between 
dialect and la/fguag.e. Their main interest was in classifying languages rather than 
in. comiounica/ion \itsfelf. They discussed four methods * vhj^ch could be 4ised to 
distinguish language from dialect: (1) ask the informant, (2) count samenesses, (3) 
structural status, and (1) test the informant. It is their '.'test the informant" 
method which has developed into the intelligibility testing techniques discussed 
here. Basiqally,' their method was this: mak£ a tape recording in dialect A and see 
how well speakers in dialect B can understand it. Voegel'in. and Harris suggested 
measuring understanding by noting the accuracy with which speakers of dialect B 
could translate the text. # 

O t . ... 

Hickerson, Turner, apd Hickerson (1952) were the first to use the Voegelin and 
Harris method of testing the* informant in a.field study..' They defined the sketchy 
outline of the method given in the original paper to determine relationships among 
seven Iroquois -languages of North America. A second intelligibility survey was 
conducted soon afterwards by Pierce (1952) among Algonquian language? of North 
America. Later Bigg's ( 1957) conducted a similar survey among the Yuman languages of 
North America. % 

All three* of these surveys used basically the same method. The investigators 
obtained a translation of the original text and then scored the subject'% 
'translation of that text to arrive at a percentage of /items which were correctly 
understood and translated. In the first study, the investigators took down an exact 
translation of the text from its teller, and scored section by section translations 
or the text by subjects as incorrect, one-third f two-thirds, or fully correct. 

/■ 

20 
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PUrce (1952) used what he called a "standard grading translation". Rather than 
trying to obtain an exact;, morpheme by morpheme transcript JLon and translation, he 
obtained a running translation into English from the person who told the story and 
from two ather speakers of the same community. He then compared these three 
translations, to oonstruct the standard gradkig translation which listed the main 
semantic ujnita in each sentence of the text. A subject's translation of 'the text, 
was scored as correct., incorrect, or half correct for each unit. . Pierce also 
recognizes ^the importance of the hometown test as a measure of a subject's abilities ' 
and was the first to suggest that it could- be used J.n adjusting rawf intelligibility' 
scores to control for differing subjects' abilities. Biggs (1957) was the first to 
administer tests to groups of subjects. 

In these three studies,- as well as in -the original proposal of Voegelin and _ 
Harris, the main emphasis or perspective was language classification. They were not 
interested in. the intelligibility scores as a measure of communication as much as 
they were, interested in using intelligibility to measure "dialect distance" (Pierce 
1952, Biggs 1957) — the degree of relatedness between speech groups. Becausf of 
this they took the asymmetry out of intelligibility test results by computing what 
they ca||.ed a percentage of "mutual" intelligibility, which averaged the amount of 
Information flow in'both directions between a pair of dialects. 

2.2.2 Wolff's critique 

+■ 

In 1959, Hans Wolff wrote a criticism) in which he questioned the validity of 
using intelligibility testing to measure "dialect distance". In his paper he makes 
the following criticisms of the method (this list follows Yamagiwa 1967:14-15): 

(1) The method seems to measure primarily the subject's ability. to translate. 
While ability to translate obviously i^esupposes some type of* intelligibilty , the 
reverse is not necessarily true. N 

(2) The translation is made into- a third language, thus introducing an" 
additional uncontrollable factor. 

\ ' . . . 

■ (3) The subject may dislike the notion of having to produce a translation. 

(4) The subject's reaction to hearing speech from a' lifeless box rather than in 
a normal sociolinguistic situation constitutes another uncontrollable variable. 

(5) The subject's psychocultucal reaction tq a different form of speech and 
possibly to the people who customarily speak it may enter into the testing. 

• , -* ' 

(6) Dialect distance can be tested effectively only if the non-native dialects * 
have not been learned, , 

U^The teat doea not permit ua to distinguish between intelligibility due to 
linguistic proximity alone and that which is due to some kind of learning process. 

t. . . * * 

(8) The teat yields little useful information when we are faced with the 
baffling phenomenon of nonreoiprocal intelligibility, 

» at*" 

a) 

WolPf went on to discuaa the cultural factors involved in communication between 
different dialects, illustrating with four examples from Nigeria. He concludes that 
although linguistic proximity, may play a limiting or boosting role in communication, 
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the decisive factors are cultural. Thus intelligibility tests eould\not Ije a -valid 
means of" measuring linguistic proximity. 1 > 

i 

While §11 of the points made by Wolff are basically correct, he /nade a few 
oversights which render many of his criticising vacuous. In the first four points he 
spoke of uncontrollable factors which affect [the results, of the intelligibility 
tests: » the subject's ability to t^anslat/Ie, the subjects proficiency ina third 
language, the subject's dislike for having to produce a translation, and the 
subject's reaction to hearing speech f r#ra a lifeless box. Here Wolff's" use of the 
word "uncontrollable" is incorrect. What he means is "unmeasurable" , It is true 
that the subject's translation ability, his bilingual ability, and his attitude 
toward the test situation cannot be easily measured. However, these measures can be 
controlled for and an attempt to do so was made in the early intelligibility 
studies. 



.Th£ hometown test (the test on the subject's own dialect) was used in the early 
intelligibility tests for this purpose. Pierce (1952:206-7) goes ^o some length to 
explain how the hometown test can serve as $n experimental control for these 
immeasurable factors. Presumably, an informant should score ^00% i^iderstanding of 
his own dialect. Any difference, between the observed" score and 100^ can be 
attributed to the factors above: a lack of ability in the translation language, a 
lack of skill in translation, or a reaction against the test situation. Pierce 
suggests we ' can assume that these same kinds of deficiencies which affected the 
subject's translation of his own dialect will also affect his translation of the 
other dialects. Pierce goes on to say that if all of a subjects scores are divided 
by his score on his own dialect, the unmeasurable factors cancel each other out. As 
a result the score on his own dialect will be raised to 100$ and all other scores 
will &e raised by a proportional amount . All such scores between different 
informants are comparable because the effects of differing levels of subject ability 
have been compensated for. As Pierce shows, we may not be able to ^measure exactly 
the ^effect of translation skill or a third language skill in the test results, but 
the fact that we divide the one score by the other cancels out their effects and 
what remains is the measure of intelligibilty , Thus Wolff failed to recognize the 
significance of the hometown test as ap experimental control in the intelligiblity # 
test design. 

The eighth point above, that the" test yields little useful information when we 
are faced with the baffling phenomenon of n'onreciprocal intelligibility, is not a 
criticism of intelligibility testing itself, but rather is a criticism of the ^way 
the early studies interpreted the results of the test. Pierce (1952) and Biggs< 
(1957) disregarded- the asymmetry in intelligibility relations. Since "dialect 
distance", the relation which they were trying to^ measure, is symmetrical, they 
chose to compute a "percentage of mutual intelligibility" as a measure of dialect 
distance. This percentage was the average of the score in eaclt direction between 
two dialects. The fault here was not in their method of measuring intelligibility 
but in their assumption that it should be "mutual" when in fact, it is not'. 

The remaining three criticisms are again a criticism not against the method of 
testing intelligibility, but against the way in Which the original investigators 
interpreted and applied their results. Wolff was arguing that intelligibility 
scores not only tell us something about the linguistic distance between two dialects 
but they also tell us something about the social relations between the dialects. 
This relation c<?uld be manifest in attitudes which would result in*.a negative kind 
of reaction against the test tape (point number 5), or in favorable kinds of 
relations that could result 'in the learning of different dialects (points 6 and 7). 




With four examples from Nigerian languages Wolff goes on to show that 
intelligibility measures both linguistics relations between dialects and sooial 
reTations. Wolff left off his argument at that point. It follows, however that if 
the investigator can demonstrate that. the sooial relations between dialects are 
absolutely nil, then the measupa-of intelligibility oan be viewed as reflecting only 
linguistic relations. In such a case, intelligibility ' ecores may have value as 
offering a composite measure of phonalogioal , lexical, grammatical, and semantio 
relations between dialeots. It was suoh an understanding that motivated Biggs 
( 1957:59 > to soreen "his subjects and'disoount any subject who had had extensive, 
prior contact with the language being tested. 

• An analysis of Wolff's criticism of intelligibility testing as it was praoticed 
in the 50's indicates that he actually made no legitimate critioisms against the 
methdd that was being employed to measure intelligibility, only against the way the 
results were interpreted. The real value of his paper is in demonstrating with some 
very good examples that intelligibility measures not only linguistio relations but 
also social ones.. Thus we must be extremely cautious in interpreting 

intelligibility scores as a measure of dialect difference. 



2.2-3 Casad's method 



In the early 60's, John Crawford began adapting the methods^of intelligibility 
testing (Casad 1974:58 f f ) . He agreed with Wolff's criticism of the way the early 
intelligibility tests had been administered and interpreted. However, he reasoned 
that' if the actual testing teohnique were improved, intelligibility scores could be 
used as quantitative measures of the amount of information transfer between, 
dialects. His interest was not in dialect distanoe, but rather in how widely a 
dialect could be used in vernacular literature and education programs, Papers by 
Bradley (1968) and Kirk (1970) are initial reports on Crawford's refined technique 
and its application in a number of projects by the Summer Institute of Linguistics 
in Mexico.. They took his ideas and_ continued, to refine the techniques of 
intelligibility testing while conducting field 'surveys of various language groups. 

-Three works, which have -recently . come out of tha& project giye extensive 
coverage of dialect intelligibility testing. Casad (1974) has written a thorough 
manual oh how to conduct a survey and how to interpret the results. In addition he 
gives historical and critical reviews of * the method and discusses alternative 
techniques. His book is an invaluable source ort the topic of dialect 

intelligibility testing. Stoltzfus (1974) treats the problem of designating certain 
dialeots as centers for -indigenous literature programs 'and then supports the 
discussion with analyses of »ix dialect surveys conducted in Mexico^ Grimes (1974) 
concentrates on the methods used to analyze the survey data 'and convert them to 
decisions on dialect groupings and centers. 

In this apprbaph, Wolff's major criticism, that intelligibility scores are noA 
valid measures of dialect distance, is bypassed by yiewing the scores strictly aT 
measures of information transfer, not dialect distance. Here the investigators are 
interested in determining the extendability of vernacular literature produced in any 
given dialect. The social factors which affect intelligibility, such as negative 
feelings that .limit communication or good relations that boost communication, are 
also likely to limit or boost the extendability of literature -in the same way. Thus 
intelligibility, taken as a composite measure of both linguistic and social 
relations, measures exactly what they were looking for. 
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These Investigators also tried to combat the other aspects or the original test 
design which Wolff criticized. Wolff critfcize^the method because.it tested a 
subject's ability to translate as much a% it did his^Riderstanding. ■ Thua the method 
was changed so that subjects are not required to translate a passage, but to answer 
specific questions about the intent of a passage. As a convention, a text is 
divided Into ten segments and a question* is asked for each one. Moreover, Wolff 
complained that {he subjects were required to make, a response in; a third language. 
In this refined technique the questions and the rsubject's ansfwer to these questions 
^are given in his own vernacular. Again, Wo\ff felt that the ^subjeot' a reaction 
against .the test situation itself and against th> methods and 'equipment of the 
investigators- could introduce an uncontrollable Variable into the test results. In 
this refined. technique the first tape which; any subject listens to is an 
introductory tape in his. own dialect. This tape first introduces the Investigators 
and explains their purpose, then it explains how the testing will be done and gives 
a short sample test in which questions are asked ar^d the correct responses are given 
for an 'example. ' This ^introductory tape is meant to relax the subject and 
familiarize him with the invest igators , their techniques, and their equipment. 

•Casad summarizes the steps in preparing the test tapes for eaeh dialect as 
follows (1971: 100) : 

The survey team must complete the' following series of steps at each 
test point: (1) elicit and transcribe an adequate text, * (2) formulate a 
set of questions from the translation of that text, (3) translate the sets 
of questions for all the test tapes into the local dialect ,v(4) prepare an 
introduction tape, (5) submit the translations of the\uestions to a 
pre-test panel of speakers of that dialect in order to detect\nd correct 
translation errors, (6) make €U_iiubbed copy of a hometown text for 
constructing the hometown tj^st t£pfet v and (7) record the translated, 
questions and the introduction tape./.. This preparation entails a day's 
work, * 

To administer the tests requires another day. The tests are administered to 
individual subjects and as a convention, Casad suggests that ten subjects be tested. 
About 45 minutes are required to administer a set of test tapes to a single subject 
(Casad 1974:24).' That, amounts to seven and a half hours for ten subjects. 
Furthermore, the method requires a survey team of two members (1974:3). The total 
requirement for each test point is then two investigators for two days*, or four 
man-days, ^ * ^ 

\ 

i 

f 2.2.4 Other recent methods 

In this section, four other recent methods are considered. In each case these 
investigators studied intelligibility to leqfrn about communication between speech 
communities and not tfc estimate dialect^distance . Thus thej£_3id*atepped the brun-t 
of Wolff's criticism. In the first three methods the investigators * used written 
tests to increase the efficiency 'of datta collection. In thie sebond, third, and 
fourth methods the investigators used translated texts to control for variation in 
the difficulty and subject matter of the test materials. 

Y^amagiwa (1967) studied intelligibility among Jiapanese dialects by 
administering tests to six?y-five. university students and graduates. Because of the 
academic sophistication of his subjects, he was \able to have the subjects make 
written translations of the , texts they heard. The students heard short portions of 
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speech twelve to twenty seconds long recorded in ten different dialects. They heard 
tnree repetitions, of each dialect sample and were asked to write out a translation 
01 each. These translations were compared to a standard translation to score the 
amount of understanding. The advantage of this written translation method is 
twofold: tests can be administered- in a classroom kind of setting so that a large 
number of subjects' can \>e tested at one time, and each subject transcribes his own 
M° n ?° that the inves tigat>or i**hot left with recordings that he must later 
mnp2 S °r r 6 an ^ sco, : e - The effect i» that the. investigator can collect many times 
more information in much less time than is possible by the methods in Sections 2,1, 

k« / 2 ' 2 ' 3 ' 0f cou ^ s « suoh a method is limited to a very restricted kind of 
subject. 



TM( ,Jl! " e f f two I ; t y dl ? 8 come out of the -Survey of Language Use and. Language 
Teaching in Eastern Africa" project. They are the study of intelligibility among 

! d ^° languages of Ethiopia by Marvin Bender and Robert' Cooper (1 9 7D and tests 
i »Trn h - w ith speakers of two Bantu languages of Uganda by Peter Ladefoged ( 1 9 68, 
Ladefoged and others 1972). Bender and Cooper studied intelligibility among six 
languages. Six separate stories about everyday topics were translated -into each of 
the six languages, giving thirty-six passages. These were then spliced into six 
t«tt tapes, with one story per language on each tape. The order of the languages 
was different on each tape. The tests were administered to sixth grade school 
children in the classroom. -~*©n the test" tapes each story (averaging 175 words) was 
followed by three questions with four multiple choice responses each. The questions 
and their alternative responses were-printed in test- booklets in Amharic. The 
students heard each story with its questions and responses two times before marking 
the response in the test booklet. Before the actual testing began there were three 
practice exercises. Bender and Cooper report that it took 45 minutes to administer 
A set of six tests, including practice. That is, they were able to test a whole 
classroom full of literate subjects in the same amount of time that one subject can 
,be tested by the methods reviewed in" preceding sections. 

in„.,nI h ^ USe 25 ^ ansla ted stSries has the disadvantage that the investigator canrfbt 
insure the naturalness of the texts. On the other. hand, it has the- advantage that 
one can experimentally control for the differing difficulties of the texts' They 
translated six stories into six languages, or thirty-six passages. These were 
arranged in six test tapes including one passage for each language. .Students to 
take the tests-were divided into six groups and each group heard a different test 
tape. The result was that each group heard a different story from the -same 
language. The sum of responses for all six groups on a particular language is a sum 
over all six stories. The total responses on each of the six languages are a sum 
over the same six stories, and • therefore the intelligibility totals are based on 
identical texts and questions. This Is not true of the other methods considered so ' 
r i a *ll p ? rt ^ ular kind of experimental design is called a Latin Square design. 
Coupled w_ith the statistical method of analysis of variance, it can be used to. test 
hypotheses concerning the relative effects which different groups of subjects, 
different stories, different languages, and the ordering of the tests on each tape, 
nave in explaining the observed differences in intelligibility. 

Ladefoged used basically the same metfcod as Bender and Cooper with translated 
stories, Latin Square design, and written tests among school children. His tests 
were simpler in that the stories were shorter and the subjects were required to « 
answer only one multiple choice question on what each story was about. Each 
question had three possible responses. 
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Gillian Sankoff (1968:151-5) used translated, texts to test intelligibility 



among the Buang of Papua New Guinea using an oral method. She made teats for. six 
languages. First, she composted six short stories having tcf do with daily life in 
the village. The English text of each was about 100 words long* Th^n each story 
was translated and recorded on tape in each of the six languages, resulting in 
thirty-six taped stories. Each subject then heard & different, combinations of texts 
tn the different languages , though the order of the languages was kept constant. In 
administering the tests, an individual- subject listened to a story onc«. Then the 
investigator personally (rather* thah on the test tape) ■ asked , three questipns about 
it in the vernacular* The questions were phrased so as to have bri^f answers. An 
answer Ws scored 2 for completely correct, 1 for partially correct, and 0, for 
wrong. The result on a test ranged from 0 to 6. In the three dialects she tested 
16, 20, and'48 subjects. A drawback of *the method was that sublets tended to 
forget items which they aptually understood*, as evidenced by the fact that subjects 
averaged 70% understanding of their hometown dialect test. 

A second part of the test was designed to measure^ comprehension of vocabula$# 
items in the texts. Ten content words were selected from each st/fry for the test of 
vocabulary items. . After the subject had listened to the story and answered the 
questions the text was played again, stopping the tape. at the ten selected words. 
The subject was then asked to translate the word into his own dialect.^ A response 
was scored as correct c?r incorrect , thus 10 points were possible -for the test-. 



2.3 A taxonomy and evaluation of intelligibility testing methods 

The methods of intelligibility testing which have thus -far been presented are 
now classified according to six dichotomies. The alternate approaches within the 
dichotomies are evaluated in terms of optimality and relation to the investigator's 
goals. As a conclusion the relation between the abilities of the potential subjects 
and the methods of testing is considered. The results show that no on6 method of 
testing intelligibility is inherently better than another. Rather, the goals of the 
investigator and the capabilities of the subjects work together to define a method 
which is best for a situation. . It is hoped that the follpwing discussion will serve 
as a guide to thosjy*mo N raus t plan a dialect Intelligibility survey. 



2.3.1 A taxonomy ol^ intelligibility testing methods s 

Methods of intelligibility*testing can be classified according to the following 
six dichotomies: 

(1) Language of response = (Vernacular,' Coffitaon) 

(2) " Mode of response a: (Oral, Written) p 

(3) Format of test = (Question, Translation) 

(4) Scoring method = (Quantitative, Qualitative) 
(3) Source of text * (Eltpited, Translated).- 
(6) Sampling method = ^Groups, Individuals) . 

that is/Cl) subjects may' be asked to respond JLn tfieir vernacular language- or in 
some language such as the national langurig^ or trade language which is common to 
them and the Investigator, (2) subjects may be asked, to speak their responses or 
write thera, (3) the test may be formatted .so that subjects are asked 'to respond by 
answering questions about the text or by translating it, (4) understanding may be 
scored ' quantitatively (as a percentage, for instance) or qualitatively (as being 
adequate or not adequate, for instance), 05) the texts for tests may be elicited 
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narrative or may be translations of pre-written/texts , and (6) the subjects may be 
sampled as a groqp or as individuals. Figure 2. 1 sets out a table in which the 
-.methods discussed thus far in the chapter are classified .as to their -values for each 
of , the six: dichotomies. . * ^ 

•- 

2.3-2. An evaluation of intelligibility testing methods 

-i * ■ • 

< p, Ther % are many P° ssible Wfl y s to test intelligibility. The methods classified 
in Figure 2.1 are ways that have been used; other combinations of the s'i* variables 
could be proposed. In this section I argue that -no one-method is inherently better 
than another.. This observation is borne out in Section 5.5 where it is shown that 

'mi?hJ! an !f ?J ff6r . ent w met 5 ods ' «ive essentially the same result. The choice between 
metnoas is therefore based on restrictions caused by, the abilities of the subjects 
(see Section 2-3-3) and by thKinvestigator' s goals. Where choices still remain, 
the decision is.~based on a criterion of optimalit.y. I define an optimal method as 
one which allows the investigator to gather - the greatest amount of information 
possible with t the least amount of effort possible. Each of the six dichotomies is 
now considered in turn, 

» ** ■».'..■ 

" *» » 

0 (D Language of response - The subjects can respond in their vernacular^ 
language or in a common language, suoh as the national language or a trade language. 
In a question approach, the same language will also be used to formulate "questions. 
Where a common language can be used (that is, where the subjects are adequately 
bilingual), the^oamon language approach is optimal, Thds is because it requires 
the -least amount oT effort to prepare test tapes. In a vernacular approach, such as 
Casad's (Section 2.2.3)^ it is necessary , to construct a new test tape for each 
village where a text will be tested. The text remains the same but . the questions 

•that go with it mu^st be translated into the local dialect and dubbed in to create a 
new test tape, otherwise the difficulty of understanding the- questions themselves 
compounds the difficulty of understanding the text.. With the common language, 
apprqach, however, the same test tape is used in every village where a particular 1 
dialect is tested. This saves time as well as increasing consistency since subjects 
in different villages hear the same test tape instead of different versions of it. 

One of the problems associated with selecting'a common language approach is 
insuring that the subjects are adequately bilingual in the common language. When a 
^ommon, language approach is used, the hometown test serves as a control for 
bilingual abilities. If a subject's score on the hometown test does not near 100*, 

nSSI^i* <? ay - i ;J i0 » fe «."th«t* he is not " sufficiently bilingual to take the test. 
However, if a subject does score nearly 1001, then his bilingual abilities are not 
at. issue. Thus the hometown test serves to validate the assumption that the 
subjects are sufficiently bilingual to be tested in the common language. 'However, 
if,- it turns out that many potential subjects are disqualified from further testing 
because they r cannot respond adequately oh the hometown test, then the use of the 
common language may bias the results. If the investigator still wants to use a 
common language Approach, then it may btf necessary to do an in depth study in one or 
two/villages with a ^vernacular method to see if a common language approach for the 
entire survey would be Valid. For instance, Gillian Sankoff (1968:168-9) used a 
vernacular approach to test 48 individuals from Mambjjmp village, Papua* New Guinea, 
on their understanding of related dialects and" on three languages of Wider 
communication ( including New Guinea Pidgin) . She found that although women scored 
significantly lower than men on the Pidgin -test, there was no significant difference 
in their soores on related dialects. Thus I would suggest these results indioate 
that in this area one couia use a common language approach with Pidgin which tested 
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Figure 2.1 Methods of testing .intelligibility 
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only men, and not bias the results by exolifdlng women from the sample. 
/ ' 
Another way In which a oommon .language apprpaoh ia optimal ia that it does not 
require thai: the investigator be familiar with the vernacular language. Casad 
suggests that for his method one of the investigators should speak at least one of 
the dialeots in the language area under study (1974:3). A method with' that 
requirement means that intelligibility cannot be tested \in a new linguistic area 
Qntll someone has spent a number of months, perhaps a Ofew years, learning a 
language. Joseph Grimes (personal oommunioation) feels that until there is auoh an 
investigator, intelligibility is too fine grained a phenomenon to measure. I do not 
think so. . There should be methods of testing intelligibility (some are suggested in 
Section 2.3.4) whioh allow # » team of survey teohnioians to go into any new area and 
ZltL th « ln ^ ni 8^ity situation (as well as the 8 linguistic and socUl 
relations, Chapters 5 and 6) so- that wise decisions abouf language planning oan be , 
made before personnel are actually assigned to in depth study of languages in the 

InJ a 'h. Th !. ln , d ! P H StUdy may later su 8« e5t ohanges in strategy, but they will 

not be as drastic as they would have been had initial decisions been bLed on 
linguistic relations alone. A oommon language approaoh, where it is applicable, 
means that the intelligibility . situation can be measured immediately without having 
to wait months or years ^or the investigator to gain proficiency in the local 
vernacular. where the oommon language approach oannot reaoh all of the population 
but yet a sizeable portion of it, then it is still valid to use it with the 
understanding that it gives better results than no survey would, and that a 

vernacular approach will be- used at a later time to refine the analysis of the 
dialect situation. 

u . „ ( ?> Mode ° f response - The subjects may respond by speaking or by writing. 
mqtV J 86 ?' the wrltten approach is certainly optimal. Bender and Cooper 

(1971) could simultaneously test all of the students in a classroom on six written 
testa in H5 minutes (Section 2.2. H). Casad (Section 2. 2. 3) and myself (Section 

IkL L^S? ??" ul V SUCh 8 battep y of ora * ^sts to only one indivudual in the 
same amount of time. Where the level of writing skill is high enough not to fprm a 

-f^n! r .L° f 0Wn ' wrltten tes t» y^ld a much higher return "for a given amount of ' 
eiiort than oral tests. ~* 

(3) Format of test - The test may be formatted so that subjects oan respond by 
answering questions about the oontent of the text or by translating it. In this 
case, we oannot really claim one method is optimal over another; this depends on the 

^o.T?° dS ?? ln ? USed alon « 8ide it and the investigator's goals. A translation 
approaoh is optimal as oompared to a question approaoh in whioh questions are dubbed 
into the test tape because it is simpler to prepare the teat tapes^or'the - 
translation test. On^the other hand, it is much much easier to soore a question " ' 

™!fL^ q T antltat J V ? ly o than " 13 t0 * 0O * e a translation test. In the case of the 
method I suggest infection 2.1, t>he translation appproach turns out to be optimal 

since a qualitative Wethod of scoring ia used. There is a tradeoff here between the 
two aspects of optimal ity : . information and time. A translation approaoh measures 
understanding of every item in the text, not just selected points of oontent, and 
thus yields more information. A question approaph, on the other hand, yields its 
information with much less effort since it does not require item by item comparison 
or translations.* 

The ohoice- .'between translation and question approaches is partly one of 
sampling, m a question approaoh, the investigator is sampling frdm the text. He 
is concentrating on -a'few points. of content /and trying to generalize to the whole 
text. The problem-is essentially, "What is 4 'the likelihood . that the subjeot's" 
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responses to/ the selected questions are a good indication of his understanding of 
the whole text?* The accuracy with whloh the questions reflect the' whole text is 
involved her*. The questions may happen to hit only items whloh are similar between 

the dialeo-ta, or only Items whioh areNdlssimllar . The more«questlons that arja 
askqd, the lttss "likely the sample will be biased. 
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sipg of the questions will, also affect the responses. That is, a 
have understood something but the question does not bring out that fact, 
conducted by my wife and me among dialects of the Blllau language, In 
inea (L / Simons 1977:250), we found that the most often missed questions 
uestlons. The second most often missed questions were "how" ^questions. 

st often missed were questions wftere the answer was rather far from the 
segment of text. We thus found that questions can be simple or difficult 
how they are phrased and wh§re their answers are fourtd with respect to 
the test tape-. These phenomena are independent of the subjects 1 
of the text. However,, they will surface ip the responses given- by the 
ecU These factors can be controlled for by adjusting all of a test's 
le basis of its hometown score. (Section 5.2.4). 
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-When a translation approach Is used, understanding of the whole text Is tested. 
This avoldA the sampling problem o'f how well understanding of the Items questioned 
measures understanding of the whole text. However, It still ,does not, avoid one 
serious sampling problem which affect* all methods of Intelligibility testing. This 
ed up In the quest idn, "What is the likelihood that a subject 1 s 
of this text is a good ipeasure of his understanding of the whole 
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(4) Scoring method - A subject's understanding of the text can be scdred 
quantitatively o^ qualitatively. In a quantitative method, the number of items 
correctly translated 6r the number of questions correctly answered Is added up and 
jumber is thef score. These scores are generally converted to 
In a qualitative method ' t|ie Investigator does not count the 
•ather, he Judges It along some discrete scale of levels, for 
instance, adequate or not adequate; 'full Intelligibility, partial Intelligibility, 
or no intelllgibUMby . Again we cannot pronounce one method optimal In all cases. 
With translatlon\approaohes the qualitative approach Is optimal In the sense that it 
requires less eVfort; however, a quantitative approach opens up a broad range of 
statistical methods that can be used In the analysis pf results. With question 
approaches, the questions may not provide a large enough sample of the tefct to allow 
a qualitative Judgment. For Instance, the written methods of Bender and Cdoper 
(1970 and Ladefog<ro ( 1968 ) presented in SfctlQn .2.2. 4* used only three questions and 
one question respectively. It would be Impossible to base qualitative Judgments on 
such small samples. 

Of all the methods classified In Figure 2.1, the method I suggest, in Section 
2.1 Is the only one which uaes a. qualitative method of scoring. Since such a method 
,of scoring has not appeared widely In the literature it would be good to (Jiscuss it 
here. Qualitative scdring is of advantage because the scores have an lnterpret^ble 
meaning in t the real >wor*ld. Also, qualitative scoring avoids one of the problems of 
quantitative scoring^ overprecislon .* 

When the investigator scores intelligibility qualitatively, he knows what the 
scores mean and how they should be Interpreted in applying Intelligibility ' test 
results. For Instance, if Intelligibility is' scored on a simple dichotomy as 
adequate or not adequate, the investigator knows which intelligibility relations are 
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adequate for establishing dialect groupings and whioh are not (Chapter 3). When 
intelligibility is scored quantitatively , 'however , the step of interpreting soores 
for applying results still lies ahead. What" does it mean if subjeots aoore 70* 
intelligibility? In the early methods (Seotion 2.2.1) it means that a subjeot was 
able to correctly translate 70* of the text. In Casad's method (Seotion 2.2.3) it 
means- that on average, t^e subjeots answered 7 out of 10 questions oorreotly. * In 
Ladefoged's method (Section 2.2.4) it means that 70* of the subjects answered the 
one question correctly. To go from these measurements of 70* to what they mean in 
terms of ,levels of communication adequacy for a vernaoular language program (Seotion 

3-D still requires a step of subjective interpretation. When aooring is done 

qualitatively this, subjeotive interpretation occurs ^t the test site as the test is 
administered rather than weeks or months later when the,, results are analyzed. 

A potential pitfall of quantitative soores ^f or tbVunwary investigator is that 
they are overpreoise. That is, the percentage scales whioh have oustomarily been 
usexl appear to discriminate 100 degrees of intelligibility. - In aot'ual fact they do 
not. Statistical tests' of signifioanoe show that even 10% differences in measured 
intelligibility need not be significantly different. In an appendix, to Casad's 
manual (1974:167-173) the standard deviations as well as the. means for 
intelligibility scores from the Mazateo survey are repprted* A one-tailed _f test 
shows Jhat Tenango's hometov/h score of 95* (6,71* standard deviation) is not 
sig>*UJ*a'ntly greater at a 95* confidence level than the score of -87* (12.69* 
standard deviation) which' another test point, TE, scored on Tenango. TE's soore of 
46* on the Jalapa test is not signif ioantly greater a): a 95* confidence level than 
MZ's score of 35*. On the other hand, TE's score of 90* on San Jeronimo is 
significantly greater at a 95* oonfidenoe level than HU.'s score of 76*. 
\ • • 

These tests of significance are aotually testing- the hypothesis that one 
group's score v on a test is r greater than another group's score on the same test. 
They are not testing the hypothesis that one group' s~ intelligibility of a dialeot is 
greater than another's. To test the significance of the difference" between a 
group's score on one test and its score on a second test may not even be "possible 
since the tests are different. We would have to know how the. two tests compared 
with respeot to a language sampling distribution. The significance tests made above 
take into aocount only the variation in subject sampling.- To make inferences about 
intelligibility, not Just test scores, we would have to take into acoount variation 1 
in language sampling as well. Unfortunately we have no way of .measuring this, 

Casad, in an appendix (1974:173), suggests that we might do better to state 
results in terms of range estimates rather than, point estimates. Grimes, in a 
footnote (1974:262), suggests that decisions .concerning intelligibility test results 
"should ultimately be based on s tests of the significance of the differences between 
two ranges rather than on the simple greater- than, less-than relationship between 
two numbers," Both are correct; unfortunately,' these* suggestions have yet to be 
implemented in field studies. 

# 

The use of qualitative scoring techniques offers a way out. On a qualitative 
scoring soale, all the levels of intelligibility are significantly different since 
there are so few levels, generally fewer than five. When there are no signif ioant 
differences in the distribution of intelligibility within the whole population, then . 
the qualitative level of intelligibility is, reported, and there is no range or 
distribution to report. If there are significant differences in the distribution, 
then that distribution is reported; for lnstanoe, half of the population understands 
at a level of partial* intelligibility an£ the other half understands at full 
intelligibility. Such a statement is/easier to interpret than one with a peroentage 




and standard deviation; for instance, the average degree of understanding 1 * in the 
population is 75* with a standard deviation of 20%. Of course, with such a reduced 
number of levels on a qualitative scale the problem of borderlioe oases can arise; 
that is, oases which are simultaneously not significantly different from two 
adjacent lev,els. In these oases, we should probably not treat them as diffW$enl 
from either level, but as occurring in both, for analysis purposes (Chapter 3)- A~ 

Note that using a qualitative scoring technique does not give the same result' 
as using quantitative soortHg and then reducing the results down to a four ^or five 
point scale. The latter method depends on finding discrete breaks in th6 
distribution of test scores or simply rounds scores without regard to breaks; the 
former method relies oh everything (j,he investigator knows about a situation 
(including where the particular test might occur in a language sampling 
distribution) to make the Judgment. The qualitative method would probably overlap 
in the border regions if compared to strictly quantitative results. 

Although qualitative scoring scales have an advantage in terms of 
interpretability , quantitative scales are more advantageous in another aspect: 
their amenability to statistical methods for modeling purposes. I discovered this 
advantage of quantitative scores while working on Chapters 5 and 6. In Section '6,31 
intelligibility is measured on a four point qualitative scale and the functions 
which predict intelligibility are step functions. Statistical methods iike 
correlation and regression are not appropriate for the data. In Chapter 5, 
percentage measurements of intelligibility are used and the scope of statistical 
methods available for the analysis is very broad. 

• « 

Both quantitative and qualitative scores have their advantages and thus we 
might do bfest to record both. After an investigator has finished gathering 
quantitative results on a test, he could make a qualitative Judgmentisonoerning the 
degree of understanding. These Judgments would .be used in" the analyslp stage to 
give meaning to the percent sbores for the sake of interpretation. 

The results in Chapter 5 illustrate something of the paradox surrounding the 
use of quantitative versus qualitative scores. In Section 5.4 the relations 
underlying percentage of intelligibility and percentage of lexical similarity are 
very nearly the same in eight out of ten field studies. TflUs gives credence to the 
original use of percentage measurements. However, in Section 5.6 (Figure 5.8), when 
these data are pooled, the standard error of estimate for predictions of per cent 
intelligibility is plus or minus 13*. This amounts to plu$ or minus 26% for ,a 95% 
confidence band (see Section 4,4). This wide variation, in turn ,, suggests the 
desirability" of a discrete qualitative scale. 

(5) Source of text - The "intelligibility t tests may be based on texts which are 
elicited ^ree narrative or on translations of 'predetermined texts. In terms of time 
and effort, the elicitation method is optimal. It is easier and faster to elicit a 
free narrative than it is to elicit a correct translations 'Another advantage ' of a 
free narrative isj that the investigator can be reasonably sure that the syntax, 
vocabulary, and semantics are natural. \fith a translated text he cannot. However, 
the use of translated texts does have the advantage discussed already in Section 
2.2.4. With translated texts the investigator can use a Latin Square experimental % 
design to control for variations in language sampling, Although he Still oannoi 
ensure that intelligibility on the texts adequately measures Intelligibility on the 
whole language, he can ensure that all the measures, between different dialects test 
the same sampling of language. 
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more quickly. To administer the testa to a number of individuala is, optimal in that 
L "! ln f orraa T tlon ; If> bitten teata are uaed, aubjeota would always be 

sampled as individuala. In the oaae of oral teata, the decision ia baaed on the 

Soout tZ nTTi^T 0f / h ° lnVeatl « at ^: the inveatigator wanta to know only 
rfbout the potential for understanding or if he can assume that the population is' 

San?! tlT '"k l l\ muitllln « ual abilities then a group method can be used If £e 

ttel an inX^uaT 5 P artl0Ular COmmUnlty VarlM in lta abllltv t0 unders and! 
tnen an individual method is necessary. 

riumber of subjects arWeated collectively rather than individually. There is 
V^ Ziti 3 ; SP ° keaman for the «- u P ^o makes the response* to the" investigator 
UtiZ ?! r ? gr ° UP ar ° all0Wed t0 oonvera * in their vernacular language before 

faking their responses to the investigator. In using this method, Biggs argued that 

6illltZ 9P °^T ln T d T 1 ' be neaP the U r er liffllt fo " intelligibiinf bS the 
° 8 ' T / U ° Wlng dlacusalon between/subjects allows their best responses to come 
aJera«e' of llT Ti/ VZ? ™ a P° ndi n/ together is likely to score higher than the 
average of all the subjects responding. individually . It allows the subjects to 
score nearer thleir potential. Theresult-is perhaps more like what individuals 
would score if they had an hour or a day to listen instead of just two minutes The 
S \°J °T Se ' that 8 f6W ^viduals who have learned thf other dialect win 
dominate the wh^ole test while those who do not understand remain silent. If the 
to V6 r«inonH r 3ense V tnat tnla 13 happening, he must ask other specific individuals 
to respond in order to get a sampling of . the group. When this is done ^Ehe 
Investigator can actually record more than one score for the I single teltFng 
situation He can observe that the majority understands at one level, wnJ^TaTfew 
understand at another higher level. Also the investigator may note* the group's 

Uh«n°» a n J? 6 ! t ; r i: APe they attent ive? D° they laugh when it is humorous? 

When a qualitative Judgment is made, the investigator need not rely exclusively on 
spoken responses, J 

,,«.H fn°l P ^ eating hide variablil ity In the population. When a group Ceat is 

used to sample a population, one cannot observe muoh more than that some understand 

!-«.i? n % < * Whtle ° tner ' Bre at anot her, M°™ «aot methods of sampling are 
lr d H in orde [ t0 raaka precise statements about different levels of 

understanding throughout the population. When some individuals understand another 

2i!il? r * r t ?* n 0therS? U 19 becaUSe not al1 individuals have had the same 
amount of contact with the other dialeot. If investigations have shown no reason to 
suspect contact, if contact can be assumed to be uniform (for instance on the basis 
of preliminary tests on individuals), or if the interest of the investigator is in 
the upper potential, then group tests are appropriate. If the investigator wants to 
know precisely how the population varies in its abilities to understand the other 

-noli??!; J h !" h< 1 U !!, U " te8ts on individuals. The results of suoh tests can be 
I 1 ; ? a ^ f Q il * ° f multilingual abilities in the community. Gillian Sankoff 
(968.169-173 1969:8H6) has done this for the Buang of Papua New Guinea. In three 
airrerent plots 3 he shows how the Mambump community's understanding of other Buang 
schooling. ° f natl0nal and re «i°n*l languages varies with sex, age, and level of 

t-.nI^M dialeot ' a intelligibility abilities are not homogeneous and individual 
testing is used to discover what the composition is, then sampling becomes an 
important issue. Sankoff tested 48 speakers in order to build the profile of the 
Mambump dialect's multilingual skills. When the goal is to see how intelligibility 
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varies with different factors in the population, it is neoessary to get a good 
stratified' sampling with -respect to those factors (Miller 1977). For instance, if 
differences in the understanding of men and women are to be compared then ideally 
equal numbers of men and women should be tested. If age differences are to be 
investigated, then'e^ual numbers in each age bracket should be tested. The sample 
chosen should represent *a cross section of the whole population. 

In Casad* s method, where ten subjects are tested, the size of the sample is not 
sufficient to make inferences about the jprofile. of the population. It can only 

establish whether or not there is variability* Unfortunately, none of the 
intelligibility surveys on which Casad reports have taken advantage^ of the fact that 
ten subjects were tested in order to conclude sqmething about the variability in the 
population. Thus far they have considered only the average of the ten scores. 
Casad does compute some standard deviations to illustrate a measure of variability 
in an appendix on statistical measures (1974:170). In the set of thirteen scores, 
the standard deviations for scores above 90% (including hometown scores) range from 
6% to 8\5%; and for scores below 90% they range from 12$ to 20%. Before any 
inferences can be made about how. large this variation actually is, the standard 
deviations must be adjusted to account for the deviations in the hometown test (see 
Sections 2.2.2 and 5.2.4 for adjustment of means). If, there is a scatter in the 
hometown 'results amounting to an 8% standard deviation, then we can .assume that 
whatever factors caused this scatter will cause at least that much scatter in other 
tests. Whatever causes scatter in the hometown test is not intelligibility; all 
subjects should theoretically score 100% intelligibility on their. own dialect with 
no deviations. Adjusting non-hometown standard deviations in the above Mazateo 
example would out them down by .^bout half. 

Casad (1974:171-3) goes on to show how the standard deviation is used to 
compute" the standard error of the mean and then to construct a confidence interval, 
or rangfe estimate,- tor the mean. When the intelligibility for .a population is 
reported as a range! estimate, it is saying that the average intelligibility for the 
population lies between two values with a given degree of oonfidenoe. It occurs to 
me that this treatment of the results is actually hiding the variability which it 
seeks to account for. It is. assuming that what we really want tp know is the 
average intelligibility, so it accounts for variability by saying that the average 
lies within a range. < > 

What the language planner needs to know is not the average nor its range, but 
the distribution of intelligibility. The planner may be interested in how well 
those at the low end of the distribution scored, or he may be more interested in the 
upper potential indicated at the high end. He may want to define the level of 
intelligibility Tor a population as a median (rather than a mean) which says 50% 
understood better and 50% understood worse, or he may want to pick some other 
percentage. For instance, he may think it better to characterize the population by 
a level which has 80% of the population understanding that well or better and 20% 
understanding below that level. He may be Interested in the differences between 
sexes or he may want to concentrate on the responses of & certain age group. All of 
these possible applications ^of survey results require a method that is sensitive to 
the distribution of scores within a population. This area may prove to be the next 
frontier in the refinement of intelligibility survey methodology. Thus far thework 
of Sankoff (1968:164-176; 1969:846) seryes as our only model. 
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.2.3-3 Subject profile and optimal methods 

The choice of ^hich intelligibility testing method is optimal for a particular 
situation has a lot to do with the capabilities of the potential subjects. In this 
discussed relation between the subjects' and the choice of an optimal method is 

n M f P !f tlD | t | i, L subjects can be classified according to two.^lchotomous variables. 

r . f al J the J mft y ba classified according to language proficiency as monolingual 
or bilingual. Specifically, bilingual means fluent in a common language likl J 
trade language or national language which the investigator also speaks, monolingual 

n?!"!<r? a i ^ !? are n ° SUCh * omnon lan « ua 8«- S^fiond, the subjects can be 
?h!J! ^ ? acc ° rdin « t0 fading (and writing) proficients literate or illiterate, 
inese dichotomies are summarized as follows: v ' 

Language profioiency = (Monolingual, Bilingual) 
Reading profiency s (Illiterate , -Li terate) 

In actual fact these are not dichotomies, but continua, and the two values 
given are the end points. The investigator must evaluate where the subjects as a 
whole fit on the continuum and decide, for instance, if they are bilingual enough to 
use a bilingual testing method or if they require a monolingual one. This point is 
considered in more detail below. 

The abilities of the subject will partly dictate the method of testing used. 
These two aspects of subject abilities interact directly with two of the testing 
variables, language of response and mode of response. That is, monolingual subjects 
require a vernacular approach while bilingual ones could use either a vernacular or 
a common language approach. Furthermore, illiterate subjects require an oral 
approach while literate ones could use either an oral or a written one. The optimal 
method for each 6f the four possible combinations of the subject capabilities are as 




Subjects: Monolingual, Illiterate 

Possible methods: Vernacular, Oral 
Optimal method: Vernacular, Oral 

Subjects: Monolingual, Literate 

Possible methods: Vernacular, (Oral, Written) 
Optimal method: Vernacular, Oral 

Subjects: Bilingual, Illiterate 

Possible methods: (Vernacular, Common), Oral 
Optimal method: Common, Oral 

Subjects: Bilingual, Literate 

Possible methods: (Vernacular, Common), (Oral, Written)* 
Optimal method: Common, Written 

When the subjects are monolingual and illiterate then only one method is 

?I?:i!hi € 'fh! verna ° ular and p™ 1 method. if monolingual subjects happen to be 
literate, then written approach would be possible as well. However, it would not 
He an optimal method in the case of a normal intelligibility survey. A vernacular, 
written approach would require that a different set of test booklets be printed up 
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for each dialeot where tests are conducted. It would take legs time to administer 
testa orally to a sampling of individuals, than it would take to prepare the test 
booklets and then teat all aubjeots at once in a classroom sort of situation. The 
exoeption to the optimality of an oral approach for monolingual literates would be 
if the gofil of the survey were not so much to test intelligibility between the 
dialects in an area as to compile a detailed profile of the multilingual abilities 
of a few dialects. In this case the investigation would be more like a census than 
a survey and the printed test booklets would pay off. 

When the subjects are bilingual, then common language approaches are always 
optimal. The reasons have been already been discussed in the preceding section: 
the test materials are prepared only once and the investigator need not inveat 
montha or years learning the local vernacular. In actual fact, the investigator may 
find that the potential subjects are only partially bilingual or that only some of 
the subjects are bilingual. *$n the first case, the hometown test serves as a check 
on the bilingual abilities of a subject. If he can perform to satisfaction on the 
hometown test f then his bilingual- proficiency is not at issue. In the second case 
of only a portion of the potential subjects being bilingual, the investigator must 
decide if the bilinguals offer a good sample of the population or if they do not. 
For instance, Gillian Sankoff's study of multilingual ism among the Buang of Papua 
New Guinea shows that the men understand New Guinea Pidgin significantly better than 
the women, bilt that on tests for other dialects of Buang men and women d^p not differ 
significantly^ 1968: 169) . This is evidence that among Buang dialects a common 
language approach using Pidgin which tested only. men would not bias the results by 
JLeaving out wpmen. 

When the bilingual subjects are illiterate, then a common language oral 
approach is optimal. When the subjects are literate, then a common language written 
approach is optimal. 1 The written approach is optimal inthis case since the test 
booklets need be prepared only once. In the tests it is then possible to test a 
whole group of subjects individually in the same amount of time that one subject or 
one group collectively oan be tested by an oral approach. 

The ohoice between testing subjects as groups or as individuals may be 
influenced by who the subjects are, in particular by what their culture is like. In 

American culture, for instance, individualism is stressed and individuals do not 
know most of the people that are near them on any given day. In Melanesian 
cultures, however, the group is stressed and everyone in the village knows everyone 
else. My wife and I found that a method of group testing in Melanesia was more in 
tune with the oulture. Whenever we entered a village' a large group of people 
gathered around us. To isolate an individual subject with earphones while the 
remaining subjects waited their turn never seemed quite right. Sankoff reports the 
same kind of situation; however, she developed a strategy by which individual 
testing **>eoame appropriate (1968:177-8). After arriving in a village she chatted 
with the welcoming group for a while but then explained that 3 he was tired from the 
walk to the village and asked to be excused so that she could rest. Upon arising 
most people were out of the village at work in their gardens. Thus it was possible 
for her to walk through the village and find some people to interview and teat in 
relative privacy. m ^ 

f 

2. 3*4 Summary of optimal methods 

The discussion in the last section concluded that for monolingual subjects, the 
optimal test method was a vernacular language oral method. Of the meth6ds listed in 
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Figure 2.1, Casad's and Sankoff'a methods are thd only ones whioh are appropriate. 
For bilingual subjects that are Illiterate, the disoussion Indicated that oommon 
language oral approaches are optimal. The early methods (Seotlon 2.2.1) and the 
method suggested In Seotlon 2*1 fit this designation. Another alternative would be 
to modify Casad's or Sankoff's method to use common ianguage questions (L. Simons 
1977:241). For bilingual subjeots that are also literate, the discussion Indicated 
that a common language wrltten^approach is optimal. The methods of Yamaglwa, Bender 
and Cooper, and Ladefoged (Section 2,2.1) are appropriate here. 

One aspeot of the definition of optlmallty was time and effort. "A method whloh 
yields the greatest amount of Information with the least expenditure of time and 
effort is optimal. A determent to oonduotlng an intelligibility survey or to 
completing one that has been started (of the 20 surveys Casad lists as having been 
conduoted In Mexloo, only 5 are listed as having been 100* oompleted, 1974:162) Is 
the time required to oonduot the sruvey. For the oommon language methods, the. time 
required has pretty well been brought down to a minimum. In Seotlon 2.1.5, I showed 
that the method I used In the Solomon Islands required one hour with an Informant to 
oolleot a text . and another hour alone to prepare the test. tape. The early methods 
described in Section 2.2.1 might haVe taken a little longer sinoe they made exact- 
transcriptions and translations. The written methods with test booklets (Seotlon 
2.2.4) would require a little longer to prepare the booklets. For testing, one hour 
was required to test a group on a battery of test tapes. For the early methods in 
which the subject's translation was recorded, it would take no longer to administer 
the tests, though it would require additional time at a later date to listen to the 
recorded responses and score them. For the written methods, a whole classroom of 
school children were testeo^Lndividually in the time I oould administer the tvsta to. 
one group collectively. For theae methods, the time, needed to prepare the test tape 
for a dialect Is about two hours. Testing in one dialect could take as little as 
one hour when group testing is. used.. 

In oontrast to these, Casad's method for testing monolingual subjeots requires' 
two days per dialect by a two man team (Section 2.2.3). The preparation of the test 
.tapes takes the first day, testing^ ten subjeots takes up the seoond day. The method' 
of preparing a test tape begins . in the same manner a*a the otW methoda by eliciting 
a text and [transcribing it. What takes so much longer' is translating the set of 
questions that go with the l test into each of the local dialeota in whioh the teat 
will be administered. After questions are translated they must be oheoked for 
accuracy with a pre-test panel and then dubbed into the test -tapes. In addition an 
introductory tape is translated into the local dialect. In other words, a teat tape 
must be redone for every dialeot in which it will be tested. In the common language 
approaches a single ,te*t tape is made once and for all. This is the essential 
difference whioh makes test preparation require only twd hours by a one person team 
as against one day by a two person team. / 

J ? 

J Sankoff's method of testing orally in the vernaoular would require only a few 
hours by a one person team to prepare a teat tape. Thia ia becauae she did all the 
questioning personally rather than recording the questions on the teat tapea. - Thua 
ahe made test tapes onoe and for all rather than remaking them for each dialeot. 
Although this method is optimal in terms of teat preparation time^ it haa a major 
drawback in anotfier' aenae. It requires aeveral montha, or H5nger, of preparation 
time apent in language learning for the inveatigator to aohieve auffioient facility 
in the varioua local dialects to do all queationing personally. 

I am aware of two methods for testing monolinguals in the vernaoular which may 
help here. They have not reoeived widespread attention inthe literature, but they 
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may reduce the time scale for vernacular Intelligibility tests to a level comparable 
to that for the common language tests. They will do eo in two ways: only a few 
houri would be required by one person to prepare test tapes, and * learnipg -of the 
vernacular would not be necessary. One could) argue that these methods would yield 
results that were not as precise. However> I nave already argued that the results 
of methods which yield percentage scores are already too precise for the level of 
statistical significance that can be attaohed to the £*e3ult3. 

The first method is a sentence repeat method which was tried by Cr.awfor^ In a 
pilot intelligibility survey in Mexico. It was abandoned in favor of a content 
repeat test which was subsequently developed into Casad' s method for testing 
intelligibility. The sentence repeat test was as follow* (Casad 1974:60). A' free 
text was elicited for the basis of the test. Every third sentence out of a portion 
of ^this text was extracted and played back to a subject one at a time. The subjeot 
was asked to repeat the sentence. Crawford evaluated the responses on a five-point 
scale. He observed that forjrtj^hly intelligible dialects the sentence repeat became 
so easy that 'a subject's response seemed more like mimicry than a test of 
intelligibility (Casad 1974:61). HoweVer, this need not be viewed as a liability. 
It simply indicates that the test is not sensitive enough to distinguish between 
different degrees of high intelligibility. In most cases we do not need to do that 
anyway. 



Crawford observed that the results of tfte.- sentence repeat test showed little 
correlation to the results of the content repeat test. It was therefore dropped in 
subsequent studies. Casad, however, has suggested that it might be reinstated 
(1974:88). He credits Gudschinsky as saying that recent research in 
'psycholinguist^cs has demonstrated that ability to mimic sentences of a different 
dialect is dependent on one's knowledge of both the grammatical structure and the 
phonological structure of that dialect. FroA my perspective, a great advantage of 
this kind of test is that the test tape s can be^onstructed very easily and the one 
tape will then serve for all tests on that dialect. 

* _ A sentence repeat method could probably also be used by a survey technician who 
ij a gootf phonetician but not a speaker of any local vernacular. He could rely on 
bilinguals in the village or 6n a bilingual traveling companion to explain how 'the 
testing would work. In scoring responses he Vould use the clues of immediacy of 
response, spfced and timing of response ,Tpitch corttour of response, and phonetic 
similarity to the segmental phonemes of the utterance. ^ A phonetic transcription of 
the utterances could secve as a standard against wtiioh to scpre. 

A second simple method for testing intelligibility among monolinguals has been 
used by Robert ^onrad in the Sepik region o^ Papua New Guinea (L. Simons 1977:250):. 

This test consists of a number of simple questions such as, "Where is 
your father?", >"Who. is your brother?", and "How far away is your garden?" 
To construct a test tape a series of such questions is translated into the 
j dialect of , the reference point and recorded on tape. A test Is 
• administered by playing the questions one at a time to an informant at the 
test point. - The subject is permitted to respond in whatever way seems 
most natural to him. If the subject answers the question, an appropriate 
1 response is taken to indicate understanding of the question. If, on the 
ofcher hand, the subject prefers to translate the question, his translation 
is scored as correct or incorrect. The percentage of questions to which 
the subject ^gives an appropriate response is the measure- of 



intelligibility, 
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Again, this method of testing requires that the test tape be oreated Just ono© for 
ail dialeota. This method gives an interesting twist to the question approaoh. In 
the other question approaches, the subject hears a portion of text in the test 
dialect and is then asked a question about it in a language which he is sure to 
understand. In this approach, the text and the question are one and the same. 
Administering tests with this method would be bound to take half as long as in a 
text and question approach. With only the questions in the test, there is only naif 
as much to play back. This method and the. sentence repeat method deserve -serious 
consideration as alternative methods to testing intelligibility among monolingual*. 



CHAPTER 3 
FINDING CENTERS OF COMMUNICATION 



h.»w3S rf« P T J 0US chapt6r 'Panted a number of " ways in which intelligibility 
between dialects can be measured. . This chapter tells what to do next:, examine the 
patterns of communication to find groupings of dialects whioh can be served- by 
common vernacular language programs. This chapter offers practical suggestions on 
how to x apply the results of intelligibility testing to language planningin an area. 
The analysis techniques presented offejc answers to - the questions of how many 
vernaoular language programs are needed area, and where those, programs should 

oe centered. 4 '• 

4 

AinnliZt^lf** ^*^K a86 pr °« r , ain 18 defined as any program whioh seeks to 
aissemlnate. information by means of the vernaoular language of a specific region. 
The medium of oommunioation can be broadcasting, tape recordings, word of mouth, or 
literature. Literature programs .are probably the most common. Materials produced 
in- a vernaoular literature program might / include curricula and text books for 
primary and secondary education, translations of the Bible and -liturgical materials 
by the church, or general and cultural reading materials for adult educations Suoh 
projects can be oostly in terms of both money and effort. The strategy of- the 
methods presented in this ohapter is to find solutions which involve the least 
possible cost . *jt 



Basically, the problem is one of grouping together dialects which can be served 
by the same vernacular language program. Section 3.1 discusses the main" criteria 
ror making suoh groupings: 'adequacy and least cost. Section 3,. 2 presents a simple 
inspection method whioh oap be used to find groupings of dialects which fit- the 

?^ eq !3!S y / nd , l€aat 008t orlt « rl »- Section 3.3 gives a step by step description of 
the grouping algorithm which could be translated into a computer program. Finally 
in Section 3.4, a similar method developed by Joseph Grimes is reviewed. 

3.1 The criteria of adequaoy and least cost ^ - 

Many of the developing nations of the world face the difficult challenge of 
trying to communioate with a multilingual population, a population which may inolude ' 
well over a hundred dialeots or languages. Even among nations where a. national 
language is firmly established, gross dialeot variations o/ the national language 
and pockets of minoVity languages still exist. It may ndt be thought feasible for a 
oountry to initiate vernaoular language programs in #very one of its languages and 
dialects. On the dther hand, if that oountry wishes to reaoh all of its oititens, 
it must carry out its programs in, languages that are both understood and acoepted by 
all groups oonoerned. Fortunately, oonmunioating with every oiti2en does not 
usually require a language' program in every dialeot. Intelligibility tests, suoh as, 
those described , in Chapter 2, show where oommunioation can take plaoe across dialeot 
boundaries. The need then is for*oriteria by whioh we oan join dialeots into larger 
groupings that oan be served „ by a s ingle vernaoular language .program. The two 
criteria suggested are adequaoy and least oost (Stolz^us .1 97.4: 58-60 ) . 
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The task is to define groups of dialects such that all dialects within a group 
can be served by a 1 single language program, centered in one of these dialects. 
Since communication is the purpose behind the language program, a possible criterion 
for grouping is that all the dialects in the group understand the oentral dialect. 
Intelligibility . itself is not a strong enough criterion, however, since it can span 
such a wide range of degrees. The criterion of acjequapy states that a dialect can 
be grouped with a central dialect if and only if its speakers understand the central 
dialect at a level which is deemed adequate for the intended purpose. Note that the 
level of adequacy is^ not fixed; it depends on the nature of the information to be 
communicated. For instance, i,f the purpose of the program were to broadcast news of 
current events, then the hearers would not be required to have as deep a oomlband of 
the central dialect as if the purpose were to communicate about emotions, morality, 
or eternal values in a program of religious instruction. Note also th^at the 
criterion <&f adequacy says nothing about mutual understanding, but only about 
one-way "understanding. That is, for a dialect to be grouped with a. central dialect, 
it matters only that the former dialect^ adequately understand the central dialect. 
The degree to which the central dialect understands the other one is not relevant. 

/ 

TNe adequacy criterion, when |used to interpret intelligibility relations in a 
region, will designate a number of possible central dialects and a number of 
possible groupings around them. By itself it is not strong enough to suggest the 
best grouping among the possible solutions. To do this, ' the second criterion is 
added. . The criteripn £l i£&al iiaat states that the best grouping is one which 
minimizes the number of central dialects. A major .deterrent to vernaculat 1 , language 
programs is the cost involved in studying the dialect, writing or translating the 
materials to be communicated, -and then printing, recording, *> or broadcasting them. 
The total cost of vernacular language programs in an area is proportional to the 
number of central dialects in which specific programs are carried out. Thus if the 
grouping of dialects which requires setting up the least possible number of language 
programs is found, the least costly solution is normally also found. If the two 
criteria of adequacy and least cost are applied together, then such groupings will 
be found. The remainder of this -chapter tell* how this can be done. 



3.2 An inspection method for analyzing patterns o£ copaunication 

Patterns of^ communication can be diagrammed by drawing arrows onto a map of the 
test area. Simply by inspecting the pattern of arrows on the map, it is often 
possible to see a least ctost solution which fits the given pattern. The .method is 
basically this: 0)j}raw the patterns of communication on a map by representing 
e^ch relationship of adequate understanding as an arrow "from .hearer to ' speaker , 
(2) find the* dialect which t is understood by the greatest number of other dialects 
and designate it as a center, (3) draw a loop which encloses all dialects that are 
reached fby (that is, point to)^ that central dialect , (4) for all dialects remaining 
outside the loop, repeat the process beginning at step 2 and obntinue until all 
dialects are accounted for. ' x , * 

The procedure is now illustrating with sample data from Santa Isabel in the 
Solomon Islands. Seven dialects are spoken on Santa Isabel (Whiteman and Simons 
1978 ; the data are adapted from Tdble 4). These dialects are, fnom northwest to 
southeast: Zabana, Kokota, Zazao, p Blablanga, Maringe, Gao f and Bugotu. The 
communication patterns* are set out in Figure 3.1., The dialects listed aldng the 
left hand aide of the table are those of the hearers while the dialects listed along 
the top of the table are those of the speakers. % A w w yes" in the body of the table 
indicates that the given group of „ hearers understands the dialect of the Speakers. 

. I - ■ ■ ... 
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ll V * C rlLu in thl ? 1 0a8e th °y olaim to have a, command of the dialect whiolr'allows them 
7 , !! / S we as he * r U when oommunioating <wlth someone from that region.. Thia 
la defined as the level of adequaoy for thia analyais. A "no" indicatea that the 
hearera do not underatand the speakera at that level of adequaoy. 



Figure 3.1 Intelligibility on Santa Isabel Island 
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In the firat atep of the proceaa, the patterna of communication are drawn onto 
a map. in thia map all the instancea of "yea"-, in Figure 3-l'are repreaented by an 
arrow pointing from the hearera to the apeakera. Thia map ia ahown in Figure 3.2. 

The aecond step in the proceaa ia to find the dialect which ia moat widely 
underatood. This is found by locating the dialect which haa the moat arrows 
pointing to it. In' Santa Isabel* this la Maringe (MAR). 

The third, atep ia to, map the extendability of the dialect jiiat aeleoted aa a 
center. F.irat the central dialect ia underlined to indicate that it ia a center. 
Then a loop is drawn which encloses all dialects that can understand the oentral 
dialect, but exoludes all that cannot. Figure 3.3 showa the state 0/ the analysis 
thus far. * • - J 

Finally the seoond and third steps are repeated for all of the dialecta whloh 
remain ungrouped. In the Santa Isabel example, only two dialecta remain, Cabana and 
Bugotu. No arrows lead away from either of theae dialects. Therefore, the only way 
in which they can be reached by a vernacular language program ia if these two 
dialects themselves are centers for such programs. Thus we conclude that two • 
additional centers are required, one at Zabana and another at Bugotu. These two 
dialects are underlined in the map and loopa drawn around them to show the 
extendability of their language programa. 

t 

* \ 

For the final riap of the least coat aolut ion, all extraneous arrows can be 
omitted, that is, omit all arrows which do not point to a central dialect. Figure 
3.4 gives the final least oost analysis for Santa Isabel. Note that the inclusion 
of three dialecta ia ambiguoua. Kokota and Zazao could be part of either the Zabana 
or the Maringe program and Oao could be part of either the Maringe program or the 
Bugotu program. 
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Some data from the' Northern Mixteoo of Mexioo are now analyied to demonstrate 
an extension of the method. This is the analysis of data over suooessively lower 
levels of adequacy t^o develop a oontour-like map of possible dialeot groupings/ The 
data are set out in Figure 3.5. .The values in the table are percentages of 
intelligibility. The periods represent relations that were not measured. The table 
is taken from Crimes (1974:264) with three adaptations: the values in the table are 
percentages of intelligibility rather than intelligibility loss, the matrl* is 
transposed, and three dialeota (CC; CO, and AP) are omitted sinoe they were' not 
tested and have no effect on the grouping. 

The patterns of communication are analyzed at successive levels of adequaoy. 
First we might try 90* intelligibility as the level of adequaoy. For the Northern 
Mixteoo data, most of the hometown soores are not even 90*, There are no groupings 
at this level, so nine oenters are required. Next, 80* intelligibility is taken as 
the level « of adequacy. .Any Relations with 80*. or more intelligibility are 
considered adequate, and any with less are not. Besides the hometown* soores, only 
two relations are adequate at the 80* level, JE«s understanding of CH and CS's 
understanding of CH. Thus at- the 80J level, seven vernaoular language programs 
would be required, one in CH to serve JE amd CS as well, and -then one in eaoh of the 
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Figure 3.5 Intelligibility in Northern Mixteco 



Dialect of speaker: 



Dialect 

of 





CZ 


JE 


CH 


CS 


CG 


XB 


CU 


ZP 


PT 


cz 


77 




4 










• 


e 


JE 




.81 


83 




32 


™< 




e 


e 


CH 


15 




89 




20 


56 




e 


* 


CS 






81 


91 


30 


75 




• 


e 


CG 






21 




81 


19 




e 


e 


XB 






78 




17 


84. 




e 


• 


CU • 


66' 




23 








86 




• 


ZP 






73 




16 


37 




75 




PT • 






76 




21 


61 




• 


84 



CZ - Cuyamecaico Zaragoza JE 

CH - Santiago Chrazumba' CS 

CG « Sta. Maria Chigmecat itlan XB 

CU ■ Sta. Ana Cuauhtemoc ZP 
PT ■ Petlalcingo 



San Jeronimo 
Cosoltepec 
Xayacatlan Bravo 
Zapotitlan Palmas 



other six dialeota. When the level of adequaoy is lowered to 70* intelligibility,' 
the program at CH will extend to three more dialeota, XB, PT, and ZP. The remaining 
three dialeota still require their own programs. At the 60* level, a new group la 
possible: CU understands CZ at 661 intelligibility. Thus far, CO remaina iaolated. 
Only if we lowered the level of adequaoy to 21? intelligibility would CG Join in 
with the CH group. However, suoh a level of intelligibility- is too low to oonoeive 
of as being very adequate, so the groupings are taken down only to the 60* level for 
the final presentation of results. 

The reaulta of the Northern Mixteoo grouping are shown aa a map in Figure 3.6. 
The loopa showing the extendability of the dlaleot groups are shown as before. The 
only differenoe is that loops. for different levela of adequaoy are super imposed on 
the same map; the reault la like *a oontour map. The loopa are labeled with two 
items of information: the minimum peroentage of intelligibility whloh la the level 
of adequaoy for the enoloaed group, and the name of the dlaleot whloh is the oenter. 
The labeling of loops by the dlaleot whloh is the oenter ia an alternative to 
lndioating oentera by drawing arrowa as was done in Figure 3*1 • When loops are 
drawn at suooesalve levela of adequaoy, then arrows will orosa loop lines and more 
than one arrow from a dlaleot may be required sinoe a dlaleot oan ahlft to a new 
oenter at lower levels of adequaoy. When relations beoome oomplex, labeling loops 
is a olearer way to indioate oentera than drawing arrowa. 

I have ohoaen to simplify the map by drawing only the loops whloh establish 
more inoluaive groupinga. A oomplete oontour display would draw a loop around eaoh 
dlaleot or group for eioh level. For inatanoe, the CO dlaleot would be aur rounded 
by $our donoentrlo oirolea, one for eaoh of the Intelligibility levela of 90*, 80S, 
70} , and 60*. The large CH»group would have two loops around it for 70* and 60*. 



f 
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Figure 3.6 Dialect groupings in Northern Mixteco 





60 (CZ) 




• • ,*? 

When all of the contour linea are drawn in, the relative distance between dialects 
can -be found by Counting the number of contour linea (divided by two) whioh separate 
them. 

r 

A hypothetical aet of data typical of a dialeot ohain is now presented as a 
warning that the aimple prooedure deaoribed in the first paragraph of thia aeotion 
will not alwaya yield a leaat ooat aolution. (In Section 3,3 a more complex 
prooedure which alwaya doea ia preeented.) In Figure 3.7a the patterns of 
communication for the hypothetical data are ahown aa arrows. Figure 3.7b shows the ' 
firat solution one ia likely to arrive at by following the airapie prooedure: 
dialeot EE in Figure 3. 7a haa the moat arrows pointing to it so we designate it as a W 
oenter and draw a»loop. Only two dialects remain outside the loop, AA and II, and 
they do not underatand each other ao we set each of them up as the oentera for 
aeparate language programa. Thia reault with three language programa ia ahown in 
Figure 3 v 7b. • 

■' ■ . » 

Thia reault ia not the leaat ooat aolution, 1 however. Figure 3.7o ahowa that if 
dialeota BB and HH are made oentera, then all the dialeota are reaohed with only two 
oentera. In Figure 3«7b we went wrong by assuming that the dialeot with the 
greateat number of arrowa pointing to it had to be a oenter. «Thus step 2 in the 
prooedure, "find the dialeot which ia underatood by the greateat number of dialeota 
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Figure 3.7 Hypothetical data 



(a) Patterns of communication 
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(c) Least cost solution 
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and designate it as a center," is not foolproof. However, it does turn out to be a 
handy rule of thumb whioh usually works, The least oost solution oan be found by 
traoing the arrows which lead from the possible oenters that were posited in the 
second pass over the data. In the first pass, EE is posited as a oente'r. AA and II 
remain. Rather than just accepting AA and II as oenters, we must see what points 
could serve them as centers as well. This line of inquiry points straight to 
dialects BB and HH and from Figure 3. 7a the investigator oan see that all dialeots 
understand one of those two dialeots. 

3.3 An algorithm for finding all possible least oost dialeot groupings 

The prooedure presented in Seotion 3.2 is simple and works well when there are 
not many dialeots^Lnvolved However, if the investigator is not oareful to traoe 
out all the alternatives itflhy not yield the least cost solution., as was shown with 
the hypothetical data ihTigure 3.7. As the number of dialeots inoreases and the 
complexity of the pattern of arrows inoreases, this possibility beoomes more likely 
In this section, an algorithm for finding all possible least cost w dialeot groupings 
is presented. The algorithm is written in a prose format. However, it oould be 
translated direotly into a computer program whioh would determine least oost 
groupings automatically. 

3-3-1 The least cost grouping algorithm 

> ■ ' 

The algorithm is listed in Figure 3.8. It is to be read as a series of ordered 
steps. After eaoh step is oompleted, the next step in the sequenoe should be 
performed unless there is a specifio instruction to go to another step. Eaoh of the 
steps is now disoussed in turn. > 

m 

(1) The only input data to the algorithm is the matrix of intelligibility 
relations as measured by testing methods desoribed in Chapter 2. The algorithm is 
repeated for different levels of adequaoy. First a level of adequaoy must be 
seleoted. Then the matrix of intelligibility relations is transformed into an 
adequaoy matrix: all intelligibility relations whioh are of an adequate level 
become 1's in the adequaoy matrix and those relations whioh are inadequate become 
,0 s. If there are values of intelligibility whioh were not measured and whioh have 
not been estimated by means of a predicting model (Seotion 6.2), then these also 
must be reoorded as 0»s in the adequacy matrix. The matrix of intelligibility 
relations from Santa Isabel in Figure 3.1 is already an adequaoy matrix: "yes" is 
equivalent to 1 and "no" to 0. The intelligibility matrix for Northerra Mixteoo 
(Figure 3.5) requires transformation. Groupings were oomputed at four different 
levels of adequaoy: 60*, 70*, 80*, and 90*. .For eaoh adequaoy level, a separate 
adequaoy matrix must be oomputed. For instance, if the 70* level were being 
oomputed, all values of intelligibility greater than or equal to 70* would beoome 
is in the adequaoy matrix, and all values less than 70* or missing would beoome 

U 3 1 

(2) Two variables are maintained during the algorithm. The first, n, is' set 
equal to the number of dialeots whioh are speakers in the intelligibility and 
adequaoy matrioes. The seoond, A , represents the number of oenters in the solutions • 
Which are ourrently being tried. Initially this is set to one. Th* strategy is 
this: first all ppssible solutions with one oenter are tried. If pot all the 
dialects oan understand any one of the dialeots adequately*, then all the solutions 
with two oenters are tried. If no two dialeots oan adequately reaoh all the 
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Figure 3.8 Ceast cost grouping algorithm 



(1) Select a level of adequacy, then transform the 

intelligibility matrix into an adequacy matrix as 
follows t 

Set all adequate values to one. 

Set all inadequate and missing values to zero. 

(2) Set n ■ the number of dialects. Set c ■ 1 (the lowest 

possible number ~~ 
- f centers) . 



(3) Try all possible solutions , with c centers. These 
possible solutions are all ' of the possible 
combinations of the n dialects taken c at a time. 
The number of suc.h c.ombina t ions equals~nl / (cl (n-c) I ) • 

(3a) Test eadh possible solution by taking the 
logical or of the adequacy matrix vectors for the 
c centers as speakers. 

(3b) If the logical or contains no zeros, then all 
dialects understand at least one of the c centers 
at an adequate level. Write down this solution 
(but keep looking as there may be more) , 

(3c) 'Return to step 3a and test another possibility 
until all possible combinations of c centers are 
exhausted. 



it>6n 



(4) If any solutions were found, go to step 6. Otherwise, 
add 1 to o. f 



(5) If c ■ then go to step 6;: the solution is that e'acb, 

dTalecJ^Jiwst have its own program. Otherwise, go to 
steoXfT 

(6) The least cost solution (or solutions) for the given 

level of adequacy has been • / £otmd. If all desired 
levels of , adequacy have been/^ analysed, then quit. 
Otherwise, go to step 1. 
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dialeots, then all possible solutions with three oenters are tried. Thfs continues 
until a solution. is found. Since the search' begins with the least possible number 
of oenters, one, and works up, the solutions reaohed are guaranteed to be the ones 
involving the least possible number of oenters. 

* 

(3) The third step is a oomplex step made up of three substeps'. This step is 
the heapt of the algorithm; in this step the possible solutions are tested to find 
the least cost solutions. A possible solution with & oenters Is any combination of 
£ dialects . The total number of suoh possible solutions' is the number of possible 
combinations of the & dialeots taken ja. at a time. This humber'is defined by the 
quantity n!/(o! (n-o) f ) , where n! is read ji-faotorial and equals the product 
(n)(n-l)...(2)(l). For instance, the total number of combinations of 7 dialeots 
taken 3 at a time is: 

4 

7l/(3»(7-3)l) , (7)(6)(s)m^W ? | fn 

(3)(2)(1)(U)(3)(2)(1) 

= (7)(6)C5) = 35 
(3)(2)(1), 

Thus for 7 dialects there are 35 possible combinations of 3 dialeots that oould 
serve aa centers. * . N 

For an example, all the possible solutions when there are four dialeots, oalled 
A, B, C, and D, can easily be enumerated. " In the list whioh follows, the braces 
enclose sets of dialeots whioh serve as oenters. Eaoh-set is a possible solution. 
Note that the ordering of the dialeots within the sets is immaterial: 

Solutions with, one oenter a 4 possibilities: 

(A); {B}; {C}; or {D} 
Solutions with two oenters s 6 possibilities: 

{A.B}; {A,C}; { A ,D} ; . {B ,C} ; {B,D}; or {C,D} 
Solutions with three oenters ■ 4 possibilities: 

{A,B,C}; { A ,B,D) ; {A,C,D}; or IB,C,D} 
Solutions with four oenters » 1 possibility: 

{A,B*C,D} 

(3a) Eaoh possible solution is tested by taking the logioal £L ot the adequacy 
matrix veotora for the ^ oenters as speakers. In the matrix for Santa Isabel, 
Figure 3.1, the vectors for the oenters as speakers are the columns; the oolumn for^ 
a dialeot tells whioh other dialeots understand that dialtot adequately. 1 An 
acceptable solution is one in whioh eaoh dUleot understands at least one of the 
centers adequately. An easy way to determine if all of the dialeots understand at 
least one oenter is to compute the logioal jac of the speaker veotors. The operation 
of logioal yields zero if all its operands are zero; it yields one if at least 
one of the operands is one. The logioal .px of the speaker veotora for the three 
dialects which oomprise the least oost solution for Santa Isabel is as follows: 
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The fact that ' all dialeota understand at least one of the oentera la indioated by 
the faot that all elements In the reault veotor are onea. 

v- 

Any other combination of three dialeota on Santa Isabel yields an unaodeptable p 

aolution. For instance, the combination of ZAZ, BLA, and BUG leavea out two ; ( , 
dialeota: * 1 
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(3b) If the logioal jOL veotor oontains no zeros, then all' dialeota understand 
at least one of the oenters at an adequate level. That set of sl oenters is 
therefore an aooeptable solution to be written down* One should not atop here, 
however. All of the remaining possibilities with a oenters must be oheoked to see 
if there are other aolutiona. If the logioal it veotor does odhtain a zero, 
however, then, prooeed without recording anything. Note that as long as the 
aooeptability criterion is stated aa a reault veotor that oontaina no zeros (rather 
than one that oontaina all onea) , the operation of addition oan be uaed as readily 
aa the logical In that case, the reault veotor would tell how many of the ' 

oentera each dialect underatands adequately. • 

(3o) Return to atep 3a and test another possible aolution until all poaaible 
combinations of £ oenters are exhausted. 

* 

(4) If any aooeptable aolutiona were recorded in atep 3b, then all of the leaat 
oost aolutions have been found for that level of adequaoy. Jump to atep 6. 
Otherwise, add 1 to a in order to search for poaaible aolutiona containing one ? 
additional o enter. . ' 

(5) If a equals &• the number of dialeota, then the only aolution is that each 
dialeot muat have ita own language program for that level of adequaoy; proceed to 
atep 6. ' Otherwiae, go baok to step 3 and teat all poaaible aolutiona with the 
inoreaaed number of oenters. • 

(6) The leaat oost solution (or solutions) for the given level of adequaoy has 
been foundi If all dealred levels of' adequaoy have been analyzed, then quit the 

. * "... • * 
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procedure. Otherwise ,. go to step 1, seleot a new level of adequaoy, and repeat the 

prooedure, 

3.-3.2 Deoldlng among multiple least ooat solutions 

When more than one solution with a minimal number of oehters is found, there 
are at learft four strategies that oan be used to decide among possible' solutions. 
Eaoh of the strategies involves applying a principle of leaat ooat in some other 
sense. 

» 

t /• strategy is to compare competing solutions for overall information loss, 

information loaa is defined as the complement of intelligibility. Thus if 
intelligibility is 85*. then the information* loaa is 15*. The total information 
loss /for eaoh solution is oomputed. The solution whioh results in the least overall 
information m loss, oosts the least in that reapeot. In oomputing total information 
loss/, be sure that eaoh dialect is oounted only once; if a dialeot understands more 
thai* one center adequately, then group it with the center whioh it understands best. 
This will minimize the information, loss. The' computation of information loaa oan be 
refined by oomputing the average information loss per individual in the region. In 
this way large dialeota will oarry more weight in the computation* than small ones. 
To oompute the average information loas per . individual, sum the produot of 
information loss times populatTorTrbr each dialeot. Then divide by the total 
population of the region. i 

A aeoond oriterion il logiatio oost. Establishing a center, for a vernacular 
language program requires th* transportation of personnel; equipment, and supplies 
from an administrative headquarters to the dialeot area. A logiatio oost oould be 
assigned, to eaoh oenter.- For! instance, the oenter .whioh was the most diffioult and 
expensive to travel to would be the most oostly. The. possible solutions oould then 
be compared for logiatio oostt by summing the costs of their individual oenters. 

' . \ ' ' 

A third dimension of least oost is sooiooultural. Centers -oan be defined" in 
terms other than intelligibility adequaoy. In Seotion 6.1.4 the sooial aide of 
oenters in dialeot systems is disoussed. For instsxioe, geography, population, 
eoonomy, and politios oan define oenters.. So oan linguietiq similarity. In Seotion 
6.3.2 many of the different Criteria whioh define the oenter of the jSanta Cruz 
dialeot system are listed. Ideally, the centera revealed by the analysis pf 
oommunioation patterna ahould ooinqida with the sooiooultural oenters in the region.. 
If they do not, the materials whioh eminate from the language program may not be 
acoepted by the people.' Thus the\ solution,, whioh best fits the pattern of 
sooiooultural centers may oost the least in terms of Unaboeptability. • 

A fourth aspeot of leaat ooat ia\atability of groupings. It is posaible that 
groupings at different levels of adequaoy wfcLl be used in tha sama language program. 
.For ihatanoe, written matariala for the beginning reader ahould be as similar aa 
poaaible to hia hometown dialeot/ whereas experienced readers oan tolerate more 
variation and use 'literature with a wider extendability. 'if groupings are stable 
over,. manyUeveis of adeqXiaoy and if the- loops on a map are always oonoentrio rather 
th^n oriaaorpising, then the groupings lend thaipselves well to -a strategy of 
hierarchical \^nolueiona for matariala at different levels. If centers shift and 
dialaots regroup for different levels, then preparation or materials at different 
levels Is Bjiore oostly. Hhen evaluating oompetlng solutions for a given level of 
adequaoy, the; ablutions for the level above and below should be oonsulted. 



3.3.3 Computational refinement* to the algorithm 

The algorithm presented in Seotion 3*3-1. beoause it will eventually try all 
possible solutions if necessary, oan be a time consuming one. This section 
describes some shortcuts that reduce \he time required to analyze matrices, 
especially by a computer program. j \ I 

In the best case, that in whion one center is adequate, only xi solutions are 
-tested (where jx is the number of dialects). In the worst oase> that in which every 
dialect requires its own language program, the number of Solutions which are tested 
before finding this out is 2 to the nth power minus 2. If there are 5 dialects, 
then this number is only 30. If ther are 10 dialects, this number is 1022. If 
there are 20 dialects, over one million possible solutions have to be' tested — 
1 9 0M8 V 571 to .be exact. Clearly something nedds to be done to prevent testing 

impossible solutions (which most of them are). 

Taking the logical fie of a set of vectors is the easiest way find out if a 
solution is aooeptable; however, ^n most oases there is an easier way to find out if 
a solution is not aooeptable. We could* save many fruitless vector operations by 
first making a simple test to see if the current solution is even possible. 
Plausibility oan be measured by noting the total number of Ps in -the vectors being 
considered. If the number of 1 f s is less than n, then those dialects as centers 
could not possibly be an adequate solution. Tl\e advantage of using the number of 
1 f s is that it need not be' counted every time; rather, the number of 1*a in a vector 
oan be counted opoe and for all at the beginning and stored, with >the vector. To 
ttot the plausibility of a' solution involving a set of £ possible centers, the 
oduntb for those vectors are summed. If the total is less d thar\ n, . bhen the 
vectors are not jjjied. It is much faster to sum £ numbers, than to jj, vectors of 
length n, n 

These refinements oan be added to the algorithm in Figure 3.8 as follows: A 
new instruction is added to the end of step 1, "Count and store the number ojf 1»s in 
each vector for the dialects as speakers." Step 3a becomes, "Test the plausibility 
of the solution by summing the counts for the £ vectors. If, lesi than jj f then go to 
3c Otherwise, take the logical of t^e vectors." 

j We can take this refinement even further. If in step 1 the matrix v«tctora are 
rearranged in the order of. the counts for the vectors, then it is possible to know 
that when the current combination fails, certain of the remaining combinations will 
also fail. For instance, when testing for one qenter, if the first vector fails the 
plausibility teat, then so will all remaining vectors since they have an equal or 
fewer number of Pa; it is possible x to \Jump directly t,o testing* the possible 
solutions with t ,two centers 4 , Likewise, if the sum of counts for the first two 
vectors fails the plausibility test, then processing can proceed straight to 
three-center possibilities. * 

The complexities in this refinement oome in determining what to do after a 
plausibility test fails. For instance, if two-center solutions are being tested and 
the first combination to' fail the plausibility test is the first with the fifth 
vector, then there is no, use testing any more solutions with the first, veotor. 
However, it is necessary to begin testing solutions with the second veotor. It is 
still possible that .two with three, two with four, and three with four would , be 
solutions, but never beyond the fourth. The method Is to back up %n& advanoe the 
veotor preceding the one whioh failed. When'all the vectors are adjaoent, and the 
plausibility test fails, then nv'fore solutions for that many oenters need be 



tested. -An algorithm for. just this aspect of the plausibility testing is beyond the 
scope of this discussion. 

• * 

These refinements speed up the computation of least cost solutions, but it. -la 
not yet clear just how muoh. Earlier it was stated . tnat the total number of 
possible solutions to test in the worst case of each dialect asHts 6wn center is 2 
to the nth power minus "2. For large values^; ji this number takes on astronomioal 
proportions. However, this holds only for the algorithm in Figure 3.8. When the 
.algorithm is refined to order the vectors for the number of 1'a they oontain and to- 
make a plausibility test, only a plausibility teats will be made in the worst case, 
because all vectors contain only one J . The .first plausibility test for every value 
of ^ less than a will fail. Thus \j\ the refined algorithm, the best oase of one 
.center and the worst case of ji centers, require a prooeaaing time on the order of n. 
For. cases in between, more prooessing ; «ls required. At this point I do not know what 
the maximum and average processing times for the refined algorithm are. f 

^.M Grimes's optimization method for grouping dialects 

The ideas, and. method] presented in the first three sections of this chapter 
-grow directly out of Joseph primes' s work on "Dialects as Optimal Communication 
Networks" (1974). In this section I review his optimization method. 

! * 
The optimization metho^jl is based on a principle of least cost. The method is 
widely used In the field of economics where the principle of least cost is well 
understood. A typical economic problem of this sort involves a manufacturer who 
^>»e^la_to distribute his product to consumers in a wide geographical area. . He would 
phrase the question of leasycbst something like this: 



1 4 i *t 

In this geographical 'area, what is the most inexpensive way to supply 
every potential consumer w^lth the product so as to assure greatest profits 
fpr the company? 

? v ' • " 

\ For the manufacturer tha most inexpensive approach could be orje of several 
alternatives. It might be to have one central manufacturing plant and to distribute. 
,the products by truck. Or it might be less expensive to build small manufacturing 
plants in each of the citiek where the product is to be distributed. The most 
inexpensive solution would probably involve a combination of assembly plants in 
primary centers with trucking to secondary centers. f The configuration of the most 
economical solution is based on al compromise between the one-time cost of building 
factories, the cost of operating! them, and the cost of trucking. The economist can 
assign a dollars and cfnts value to eaoh poasiblity in order to determine the 
solution whi^ch is the least 'expensive overall and will thus yield the greatest 
profits, \ 

Grimes (1974) applied this principle of least, coat to the analysis of patterns 
of communication.- For the analysii of dialect groupings, he defined the question of 
least cost like this (1974:261): i • • 

In a geographical or social area, what is the smallest set of speech 
communities such that adequate! communication at a given- threshold level 
can. be established with every individual in the area by using the speeoh 
of at least one of the communitlies? • 

In Grimes's analogy to the economics problem, the cost of building and 
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operating a factory - corresponds to the ^oost of establishing a center for ,a 
vernacular 'language program; the cost of trucking, corresponds to the cost of 
communicating with the dther dialects. The cost of communicating is measured by the 
amount of information lost; the cost of establishing a center is controlled by a 
fixed cost value which is activated any timgfva dialect is a center. The fixed cost 
makes it uneoondmical to use more than a MaiKnal number of centers. The fijced cost, 
is actually stepped through a series Qf values, called threshold levels, to yield a 
series of groupings at different levels of adequaoy. When the threshold level 
(fixed cost) is low, then it is feasible to have many centers; vhen it is high, then 
only a few centers can be afforded. 

In Figure 3 . 9 1 the algorithm for Grimes's optimization method is written out in 
a step by step prose format. Detailed instructions with examples on how to use the 
method are given in three sources: /Casad 1 97 1 * : f Grimes 197 1 *, and Arden Sanders 
1977b. Therefore 1 will not repeat those detailed instructions and examples here. 
Rather, the listing of the algorithm serves as a point of reference foifc. the 
evaluation of the method which now follows. 

The optimization method has four hidden pitfalls which its user must be aware 
of: the interpretation of thresholds, the definition of litest cost, the treatment 
of missing values, and degenerate solutions. The first two problems can be treated 
by reformulating the optimization method in the way I suggest In' the following 
discussion. w It should be noted that Grimes has accepted these suggestions and now 
used the reformulated version of the optimization method. The least cost algorithm 
of Section 3-3 also avoids these problems. The third pitfall of missing values 
affects both the optimization method and my least cost method. The final problem* 
of degenerate solutions is avoided by using the least cost algorithifi. 

TP 

(1) The interpretation of thresholds - The interpretation of the threshold 
level^ has been incorrect. Grimes (1971:262) interpreted the freeholds as follows: 

r For any oommunication effort [intelligibility loss] that is greater 

than the threshold level, th^ fixed-cost furtction .renders it more 
eoonomioal to create^ another network than to add the test point" oonoerned 
to J an exisiting network. But fpr any communioation effbrt that is not 
greater than the threshold level, the fixed-cost function renders it more 
, economical to include the test point in an existing network than" to create ' 
a new network with its own additional fixed cost. r 4h 

Likewise? Casad (1974:46, B3ff.) apeaks of an intelligibility threshold of 60* 
corresponding to a communioation cost of 20. He suggests that 80* intelligibility 
is about the level of adequate intelligibility and thus that optimizations at the 
fixed cost level of 20 give groupings for the 80* level of adequacy. This is where 
the interpretation of thresholds goes astray there is not a one to one 

correspondence between fixed cost and adequacy or communioation effort* 

In the first place, the fixed cost, or threshold, value is sensitive to the 
differences between intelligibility measures, not to their absolute values. This is 
seen in the Northern Mixteoo data (see Figure 3.5) which are optimized by Grimes 
(1974:265). At the threshold level of 10, the dialects JE, CH* CS, XB, ZP, and PT 
are assigned to the CH dialect as center. "The oommunication efforts (or 

intelligibility lose) for these dialects with CH are 17, 11, 19, 22 f 27, and 24, 
respectively. These correspond to intelligibility percentages of 835, 89*, 81*, 
78*, 73* f and 76*, none of which is greater than- 90* as the. interpretations of 
Crimea and Casad would suggest • In each case, the communication effort is greater 
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Ftgure 3.9 Grimes's optimization algorithm 

(1) Transform the intelligibility matr^ into a cost matrix by changing 

each intelligibility score to a measure of information loss. (85% 
intelligibility - 15% loss) 

(2) Select the fixed cost (threshold) level. 

(3) Initially assign each test point (dialect of hearer) to a center. The 

initial center is the, reference point (dialect of speaker) which it 
best understands (lowest information loss) . This is generally 
itself. Throughout the analysis, any reference point 'with at least 
one dialect assigned to it is a center; if no dialects are assigned 
to it, it is not a center. 

(4) Step through the cost matrix comparing all possible pairs of reference 

point vectors. First compare the first with the second, the first 
with the third, and so on to the nth. Then compare the second with 
the third, with the fourth, and so on until all pairs are compared. 

(4a) Compute the cost for the two reference point vectors. If the 
first reference point is a center add in the fixed cost; if the 
second reference point is' a center add in the fixed cost. Ebr 
all the dialects assigned to either reference point, add in the 
information loss. If the cost is zero .(neither dialect is a 
center) then repeat step 4a on the next pair of reference points. 
Otherwise", continue. 

(4b) Now try one of the following three things in an effort to minimize 
the cost for the two reference points: (1) take, all dialects 
assigned to the second reference point and reassign them to the 
first one, (2) take all dialects assigned to the first reference 
point and reassign them to the second one, and (3) take all 
dialects assigned to either of the reference points -and reassign 
them to the* one which results in the lowest information loss. 
(When both reference points are centers, the first two options? 
may reduce the cost by requiring one less center, while the third 
opt ion. may reduce it by minimizing information loss.) 

, (4c) Recompute the" cost for the two vectors for each of the three 
possibilities. If one of the three reassignments yields a lower 
cost than the' original cost from step 4a, then shift the 
• < assignments to the least cost configuration, if .there is a tie 
for the least cost, the first option has first priority, the 
second has next, and the third last. . * 

(4d) Go back to step 4a and process the next pair. 

(5) During the whole pass through the matrix in step 4, if no assignments 

were shifted in step 4c, then go to step 6. Otherwise> return to 
step 4 and make another pass., 

(6) The optimal (least cost) solution for the given threshold value has 

been found. If desired, go back to step 2 and optimise for anot/her 
threshold. Otherwise, quit. 
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than the threshold level of 10, but tHe fixed-opat function finds it most economical 
to inolude the teat points in an existing network. The error is in assuming that 
the threshold level is compared to the oommunioation effort. It is not; it is 
oompared to the^dif ference in oommunioation effort between two possible solutions. 

The example given in Tables 2 and 3 of Grimes (1974:264-265) illustrate this 
point. The example is reproduced here in« Figure 3. 10. Before optimization, the XB 
dialeot is assigned to itself as center (designated by the asterisk). The 
communication effort is 16. If XB were assigned to CH, the oommunioation effort 
would inorease to 22. 22 is greater than the threshold value of 10, so the 
interpretation in the above quotation would suggest bhat XB oannot be aasig/ed to CH 
at this threshold level. However, increasing the communication effort from 16 to 22 
Is accompanied by a decrease of total fixed cost factors from '20 to ? 10, sinoe one 
less center is required. The overall effeot is a deorease. in ooat and thus the 
solution with one center is optimal for a threshold of 10, even though the 
intelligibility los„s is 22. XB was Joined to -*fie existing? network beoause the 
difference in communication costs was less than the threshold value. 



Figure 3.10 Threshold corresponds to differences, 
y not actual cost 



a. Before regrouping 
Test points 

Fixed Total 

CZ JE CH CS CG XB CU ZP Pr'cost cost 

Reference CH 96 17* 11* 19 79 22 77 27 24 10 , 64 
point XB 999 26 44 25 81 16* 999 63 39 10 



b. After regrouping- 
Test points 

Fixed Total 

CZ JE CH CS CG XB CU ZP PT cost cost 

Reference CH 96 17* 11* 1 J* ; 79 22* 77 27 24 10 60 ' 
points XB 999 26 44 25 81 16 999 63 39 0 



This quirk in the method does not appear in Casad's* examples beoause in every 
case he uses matrices in which the raw score's are adjusted to raise hometown soores 
to 1 00% (a ooat of* 0). Therefore when a regrouping would shift a dialect from 
itself to another dialeot aa center, the difference in communication coata is the 
ooat with theother dialect minua zero. In other words, in this special case, the 
threshold level does correapond directly to the communication cost. When raw 
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intelligibility scores are optimized, and the threshold values are interpreted as 
^corresponding directly to intelligibility -levels , then the raw scores are actually 
being -subjected to an implicit constant adjustment for subj.eot abilities (Section 
5.2.4). That is, it is as though the difference between the hometown score for the 
subjeots and 100* has . been added to all intelligibility scores for that group of 
subjeots. 

• r 

Whether communication cost is ba^sed on raw or adjusted, intelligibility scores, 
there will not be a correspondence between threshold level and intelligibility level 
when regroupings involve shifting more than one dialeot kt a time. In Figure 3 . i 1 a 
hypothetical .example is given. Such situations do arise in field data (Arden 
Sanders 1977:302 points out an example in the Mazateo data). However, the point is 
easier to see if a minimal example is constructed. The example shows two referenoe 
points '(the speakers) and three test points (the hearers). The optimization 4a for 
the threshold level or 20. In Figure 3'. 11a, AA is the center for itself , and' BB is 
the center for BB and CC Since two centers are' involved and the communication oost 
of CC grouped with BB is 5, the total oost for this oonf iguration is 45. Figure 
3.11b shows the attempt to reduce the oost by using one center instead of two. To 
shift all the dialects to -AA as the center looks plausible since the communication 
effort for BB with AA and for CC with AA*,is 15. This is less than the threshold 
level of 20. However, since two dialects are going to be regrouped this amounts to 
a total information loss of 30. The total oost including the* fixed oost value is 
50 and is higher than the solution using two oenters. Therefore, All of the 
dialects would groiip together with one center at a level of 85* intelligibility, but 
not at a fixed.?cost threshold of 20. 

The conclusion is that the threshold value does not correspond direotly to the 
intelligibility level. It corresponds to the difference in summed communication 
cost for two possible solutions. Thus it is difficult to assign a meaning to 
threshold values which is both meaningful when applying results in the real world 
|md is consistent. ' „ 

(2) The deffnition of least cost - It is in the definition of least cost that 
Grime3»s original analogy to- the transport problem breaks down and leads to the 
misinterpretations Just ' discussed. We saw this in the last example where the 
threshold level of 20 blocked the regrouping of two dialects with a communication 
oost of 15. The question we must ask is, "Are two fifteens worse than one twenty?" 
In economics, losing two fifteen dollar oheoks is certainly worse than losing one 
twenty dollar check. In, the economic transport problem, the units by which the oost 
of building a factory and the oost of trucking goods are oomputed and compared are 
the same dollars and cents. This is what makes the optimization "algorithm work. ■ 
However, in the intelligibility analogy, the two kinds of oost are/ not comparable. 
Communication cost is measured in teriha of information loss while establishing 
centers for vernacular language programs is measured in terms like personnel, 
transportation, equipment, and supplies. The analogy further breaks down when the 
meaning of information loss is examined. Is it worse for each of two people to lose 
15* of the information in a message than it is for one person to lose 20* or 25*7 I 
\would think not. 

The definitions-, of the oriteria of adequaoy and least coat I presented in 
Section 3-1 are the same as Grimes and Casad^have in mind when they describe the 
optimization method. They define the problem as being one of finding the smallesV ' 
possible set of oenters (least oost criterion) capable of establishing communication 
at an adequate level (adequaoy criterion) with the entire area {Grimes 1974:261, 
Casad 1974:370. In Figure 3. 11 we saw that the optimization method does not 
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actually do this, if we try to interpret the threshold levels in terms of levels of 
adequacy. * 



* 



The optimization method oan be reformulated as follows to find solutions whloh 
fit the two oriter.ia of adequaoy and least oost defined in Section 3.1. The 
reformulation is expressed as changes to the algorithm in Figure 3*9* Ih step 2, 
the threshold level beoomes the level of adequaoy. Fixed cost is given a different 
meaning in step 4. The "adequacy level is .used to determine if dlaleqts oan be 
shifted to -a new' cent jr. If their understanding of the new center Is adequate, they 
oan; if it is not, they oanfiofcs. In step 4a, the total oost Is defined in a 
different way. The fixed Jooat associated with establishing a center is an 
Arbitrarily high oonstant at all levels of adequaoy. It is so high that the sum of 
information loss for a oenter will never exoeed it (n times 100>, for instanoe) . 
The total oost for two referenoe 'point veotors is then oomputed as before. In step 
4b, for each of the three options, dialeots which understand the potential new 
center at an adequate level (that is, information loss equal to or less than the 
threshold value) are shifted. Otherwise, dialeots cannot be shifted. In step 4c, 
that of finding the least oost configuration for the two veotors, the . arbitrarily 
high fixed cost ensures that a oonf igur.ation with one oenter will always coat less 
than one with two oenters. When oomparing configurations with the same number of 
oenters, the one with the least overall information loss costs less. The 
modifications then are these: the threshold equals level of 'adequacy, the fixed 
coat is an arbitrarily high oonstant, and dialeots oan shift to a new oenter only 
when the Information losj is within the level of adequaoy. 
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fh , i:!*, T r\ t ? ent ° f * l9 , ai *l g data valuea ' Qrimealiata one of the advantages of 
the optimization method of dialeot grouping as being that it- gives "uaeful reaults 
from matrioea that oan be filled in only partially" (1974:2151). It is true that the 
method will give results -from incomplete data, but using incomplete data oan 1>e 
hazardous to the unwary investigator. This is true not only of the optimization 
method but of the methods I presented in Sections 3,2 and 3.3 as well. U is 
important to understand the effects of missing data. 

» . 
The grouping algorithms do *not actually operate Q n matrices with holes in them. 
The investigator does actually fill in' all the holes created by missing data. In 
the oase of an adequacy matrix for the method of Section 3 -3 miaaing*valuea always 
transform vto zeros, m the case of a cost matrix, missing values always transform 
tt> arbitrarily high amounts of information loss, The reault is that when there is a 
mining value, it is never possible for the dialect of the hearers'to group with the 
dialeot of speakers for which it was not tested. 

mot* 0 !! irf 6 °\J,, 0f thia ' 13 "en" in" matrices which are not square., Casad 
I 1974: 44-45) illustrates the optimization method on data from the Oootlan Z"apoteo 
area of Mexico. In that intelligibility survey, 7 test t;apes were used but they 
were tested in 10 dialects. The published matrix has 7 rows and 10 columns' 
Therefore, there are no hometown scores for three dialects. This means that those 
three dialects cannot even have themaelvea as a center; at least that is what a 
computer program which optimized the 7 by 10 matrix wquld assume, for one of these 
three^ dialects (San Andres, An, column 8) Ue lowest information loss in the matrix 
is 15* with Ayoquezco (row 5). This means that even at the zero threshold level 
ban Andres groups with Ayoquezco. Casad rectifies the situation in the dialect map 
(page 45) where San Andres remains isolated until it groups wUh- Ayoquezco at the 15 
threshold However, he would not have gotten that result if he had strictly applied 
the optimization algorithm to the o5st matrix on page 44. 

*• 

The same example from Casad illustrates another effect of- missing values' 
Since a*dialeot cannot group to a reference point for which It was not tested, there 
can be a grouping which- includes all dialects fef and only if there is a reference 
dialeot on which all dialects were tested. In the Oootlan Zapoteo matrix there is 
no auch reference dialect. Oootlan ia the main reference point with seve/i dialects 
tested on it. But the three dialeota which were" not teated on Oootlan cannot group 

?h u°?? tlan, u Aotuall y if the matrix (page 440 were optimized up to the .100 
threshold and beyond there would remain three diajoint dialeot groupa — 2 and 6- 7 
9, and 10? and the other five dialeota. In mapping the dialect network, Casad (page 
45) estimated some missing values in order to allow -the groupings to converge. For 
instance, the convergence of the DO-TI group, with the IN-OC-WA group at the 26 
threshold depends on the estimation that the missing value of intelligibility loss 
for DO on OC is equal to or less .than 26. 

The hazards of unwittingly applying these grouping techniques on incomplete 
data are dramatized by an analysis of the results from the intelligibility survey of- 
Santa Cruz Island. The -intelligibility matrix is found .in Table 2.2.6 of Appendix 
2. In Figure 3.12 the least cost grouping technique of Section 3.3 is applied to 
these data. The top half of the figure contains a map showing the .least cost 
solution on the incomplete data matrix.' The level of adequacy is 3, or full 
intelligibility. The least oost solution calls for two oenters, one at NEO and one 
at NEA. Even if the level of adequacy is lowered to 2, partial intelligibility, two 
centers arje required; The intelligibility matrix is far from complete, however. 
When the survey was conducted, it was known that the dialects of LWO and BAN were 
the central ones on v the island in terms of geography, population; and community 



Figure 3.12 Dialect groupings on Santa 4 Cruz' Island 




b. With matrix comjpLeted by estimations 
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facilities (Sections 6. 1 .U and 6.3.2) . Informant opinions showed that everyone 
, claimed to understand these oentral .dialeot (Appendix 2.1.8). There were thirteen 
dialeots involved in the survey and it was not feasible to test all dialeots against 
all others. (To be exact, 77 out of the 169 possibilities, or Uf, were'- tested,) 
The test tape* which were most frequently flot used were those from the oentral 
dialeots. When a group olaimed full intelligibility with the central dialeots and 
„ — then soored full intelligibility on a dialeot whioh was beyond the centers in the 
dialect chain (such as when southern dialeots were tested on NEO) then 
intelligibility with the oentral ditfleots was a sure oonolusion and not tested. 
Efforts were concentrated on results that oould not be interpolated. 

r 

Thus it turns out that the oentral and most widely understood dialeots were not 
- aotually tested for in the intelligibility tests.' The result is that it is 
impossible to show the true least oost dialeot grouping for the island from the 
measured intelligibility matrix. In Figure 3- 12b a oomplete intelligibility matrix 
for Santa Cruz is analyzed by ,the same teohnique. The matrix was oompleted by 
estimating missing values (Appendix 2.1.11) with the predioting model .developed in 
Chapter 6. The result is that one dialeot^ BAN , oan serve the whole island as a 
oenter for a vernacular language program. 

(4) Degenerate solutions - One charaoteristio of the optimization method whioh 
has „not yet been mentioned in the intelligibility literatqre is that it oan lead to 
degenerate solutions. These rare solutions which may not be unique. In step 4c of 
the algorithm (Figure 3.9) when two or more reoonf igurations of two vectors lead to 
equal and optimal reductions in -oost , the "algorithm specifies that shifting all 
dialeots to the f irst'veotor has top priority, shifting them to the second veotor 
has next priority, and reshuffling them between the two veotors has lowest priority. 
<s This is where degeneracy can arise. The method always pioks one of the optimal 
configurations and ignores the rest. It Just may be that following the latter 
configuration wou*d have led to another v %olution whioh was eqeally optimal. 

f 

The algorithm follows only , oner path at a time and therefore yields only one 
solution. For a given matrix, more than one .solution oould give the same Minimum 
cost, or^there could be a number of solutions with a minimal number of Centers (but 
not necessarily minimum information loss). Furthermore, Tor a given set of oenters, 
there could be a number of ,-ways in whioh all the dialeots could group with those 
centers. However, the optimization method always gives only one possible solution. 

This drawback of the optimization method, that it gives only one solution, is 
oountered in the least oost. grouping algorithm of Section 3.3. The tradeoff is one 
of computing time. In the beat case of one oenter, the Least oost algorithm is much 
faster. In the worst case of jj. oenters, the optimization algorithm is much faster 
than the least, oost algorithm of Seotioh 3-3.1, although with the refinements in 
Seoti<yn 3.3.3 the latter would be faster than .optimization. For the cases in 
between, it is not yet dear how they compare. I hope that the refinements 
suggested in Seotion 3-3.3 will make a oomputer program of the least cost algorithm 
run fast enough to be useful for large data matrioes with complex solutions. If so, 
problems ot degeneracy can be bypassed. 
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CHAPTER 4 . ,■ 

EXPLAINING COMMUNICATION : A MODELING APPROACH 

v , ? • od,lln « U " lf i ">« »«' two .xplor. th. tvo «ln oo£pon«U of TmSS. 
ror explaining ooamunloatlon: liiigulatlo almllarity and aooial relation*. 

n 

Chapter H begins with a discussion of the naanlng and advantages of modeling, 
•specially with regard to explairiing communication, This ia followed bv a 
consideration of the state of the art for the social sciences in general and for the 
common cation problem specifically. Finally, a basic mo^el explaining 

ntv !, h P T 9 !' 4 ; This model suggests that oommunioat?on, % 
intelligibility, is based primarily on two factors: the linguistic similarity 
between dialects and the social relations between them. \ ■ * """Wity 



from ?„n 5 ?\f'?5° r ° f lln « ulaUo similarity is considered in detail. Data 

from ten different field studies are analyzed in order to explore the relationship 

?or W6 :Li: Xi ? al :J? llar J ty * nd Wligibility. As a conclusion, a general mode? 
" re 38 i"« this relationship- is proposed. Even though social relit ions are not 

lSxioSl Jimiia^ity Pr ° Ve3 t0 b6 7 °* aocurate ln Predicting intelligibility from 

detail" ^n^Vain^r^ fac J or \ f l h \ soo i*i relations, , is considered in . 
tIi™*^ * general discussion, data from the island of Santa Cruz, Solomon 

r 8 i a M a ;*h!!; e cons 1 ldared - A comprehensive , model which embraces social 

relationships as well as the linguistic oles of Chapter 5 is used to explain 

SSriS^ acc^e": dUle0t8 ° n then8U * d - ™* P^i^tions derived from^ 
Why build modfcla? 

m0dal *f ! h ' ypoth#8 i a about how something in the real world behaves. The 
?h. h!n P [ eaent€d ln tne n « xt two Qhapters are mathematical ones. This means that 
the hypotheses are stated in precise mathematical terms, in this case by numerioll 

S2r!^°? a ;, *h!*° aUa# ^ hy P oth r M •«•• Precise, they can be tested with precision. 
! " n# ; raal 4 vt J ua ° f th « modeling approach: . hypothe.ia can be 

empirically teated against .observed data with the reault that the investigator know* 

In JSJLKk^J"^ i ht m0d#1 flta the data and t0 vhat «*«it i fc d °« not. When 
an acceptable model is found it" can be used for one of two purposes: to explain the 

relations underlying what has already been observed, or tL"ll*?ot I 

Particular variable in the model when the values of the other variables are k^own 

c?acu«.H n# fH tW ° ?" Pt ? 9 L tfUr tha * nod * 18 fo <* "Pining communication ™e 
discussed, they are formulate* mathematically, then teated empirically with field 

•*h!i*. au !"? we know ext ?tly how much of oom^unioation-ia explained by the 

-^■ S r and / OW BUOn i* n0t * But wny ***** want t0 to exjlairi ' 

oomttunioation anyway? K 

•* a 
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Models for explaining communication can be applied to real world situations in 
at leajt three beneficial ways, , First, they help us to understand, patterns of 
communication. The techniques for measuring intelligibility discussed in Chapter 2 
tell us only whether or not communication can take place and to what extent. The 
^methods for analyzing patterns of communication discussed in Chapter 3 allow us to 
extract general patterns of communication .and locate centers' within the network 
intelligibility relationships. However, neither method explains why there is 
communication at all or why the patterns of communication should be what they are. 
The models developed in the following chapters help us to do this. By understanding 
why patterns of communication are what they are/ and not Just what they are, leaders 
can make much better proposals about language planning in an area, , 

Secondly, the modeling' approaoh, because it is also predictive, may shorten 
many of the logistic problems associated with intelligibility testing. An 
intelligibility survey is time consuming and sometimes difficult to carry out. If 
the level of intelligibility between dialects could be predicted, then we might be 
saved the task of trying to measure it. 

Even when an intelligibility survey is carried out, it may not be feasible to 
t$st the intelligibility between all possible pairs of dialects when there are more 
than five or six dialects involved. In those cases, a predicting model can be used 
to estimate the untested intelligibility scores* Fon instance, ip my survey of 
Santa Cruz Island (1977a)-, there were 13 dialects and measurements were made for 77 
ry out of the possible 1 69 pairings, or 45% of the possible cases (Appendix 2*2, Table 
2.2.6), In Kirk's Mazatec study in Mexico (Kirk 1970, Casad 197H:3 1 »), which 
involved 23 speech communities intelligibility .was tested for only 130 out of 529 
possible pairings, or 25>. A predicting model can be used to estimate the 
intelligibility scores which are not actually tested. 

This is of advantage not only for the sake of having a complete table of 
intelligibility relations to refer to, but is necessary if "the analysis^ethods 
described in Chapter 3 are to consider all possible solutions. As. noted already in 
Section 3. 1 *, the method dWe^oped by Grimes (1974) has as one if ' its advantages that 
it does not require a qothplete matrix r of values (1974:261). However, it "has the 
disadvantage that when a value is missing, that particular reference point is 
excluded from serving as the Renter for that %est point. This would have serious 
consequences in analyzing the result's of the Santa Cruz survey, for instance, where 
intelligibility .with the two central dialects (BAN &nd LWO) was seldom tested. 
There, intelligibility with the central dialects was a foregone conclusion based on 
ther tests of more distant dialects. Unless ^values for the unmeasured 
intelligibility relations can be estimated, the analysis of communication centers 
may be' skewed in the direction of those reference points. most commonly tested for. 
Fortunately,, a predicting model can be used to estimate the missing values and thus 
avoid this problem. 

Finally, the predictive capability of a mathematical model may ultimately 
afford a more accurate estimate of intelligibility . than intelligibility testing 
itself. A major unanswered problem with intelligibility testing is that of the 
- adequacy of the text and the questions used for a particular test as a sample of the 
whole language/ A short text can represent only an extremely small portion of the 
whole grammar and lexicon of the language. Even if all the problems associated with 
subject aptitude, subject screening, emotional reaction of the subject, and 
bilingual -communication between the .investigator and subject were completely absent 
or controlled for, there would still be no guarantee that the degree of 
intelligibility measured on the *test was a, good estimate of degree of understanding 
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°rii*l h 1 * hol « lan « u ««J. I that it is this point which requires the greatest 

faith in accepting intelligibility test results. If we understood the factors which 
underlie intelligibility well enough to construct a good predicting model, then that 
model could give predictions of intelligibility which were less skewed by the 
problems of subjeot and language sampling. Ultimately, predicting intelligibility 
may be more aoourate than measuring it. " ° © j 

inf*in^K???! lin f' .7 P pedictin 8. approach may not actually replace the 
intelligibility testing approach, _at least not until we better understand the 
factors underlying intelligibility. For the present, - the two approaches «■ are 

land informant opinions) serving as a backup to Measured intelligibility and filling 

FnrrhS^nn/V 11, leas / elianc « on the measured intelligibility scores is required. 
Furthermore, the predicted scores can serve to point out measurement. errors 

• St ' ' 

T r 

1.Z The state of the art ... 

The development of modeling approaches in the social soiences is far behind its 
development in the physical sciences. In the physical sciences, a great many models 
have stood the tests of time and repeated confirmation, and have been elevated to 
the status of "laws* like Newton's laws of motion or Ohm's law. In the social 
sciences we are only beginning to use mathematical models to describe soolal 
phenomena. 

",no^? h nh Q ; 5 te ? art ' a Proponent and developer of a field of study which he calls 
social physios", traces the development of modeling in the physical sciences and 

^nw 3 ^ ? ma n ara £? J" , the 300lal aoiencos - Hi « »o.oial physics "is an attempt to 

show that many sociological phenomena clan be defined in terms of mathematical 

models, many of which are analogous to physical laws. He contrasts the current 

stage of development in the sooial and physical sciences. as follows (1952:110): 

Merely verbal loglo whioh traces baok to Aristotle still oomprises 
the sole intellectual equipment of too many practitioners of social 
disciplines, although physical science freed itself of those same archaio 
bonds as early as the seventeenth century. 

1 f 
Stewart traces the development in the physical sciences, and the parallels in 
the social scienoes, in the following way ( 1947a": 46 1 ) : ' 

v 

There was a time when scholars did not realize that number had .the 
prinoipal role in the description of the phenomena of physics. The 
transition from medieval to modern soienoe was made in oeleatlal 
meohanics, in three stages. These oan be ooncisely represented by Tyoho 
Brahe 3 extensive observations .of planetary motions, Kepler's faith in 
mathematics as a means of insight into phenomena, and Newton's progress . 
from Kepler's empirical rules for the solar system to the meqhaoios of the * 
entire universe. 

We are nQW seeing a similar development in the sooial studies. 
Astonishing amounts of significant numerioal data have been accumulated by 
oonsoientious sooial statisticians. Publications of the Bureau of the 
Census, for example, are oomparable in extent and variety with catalogues 
of stars or tables of speotrosoopic wave lengths, even if the numerioal 
preoision neoessarily is much less. Thus the observational stage is well 
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advanced. A fefc investigators whose training. ia not confined to the 
social fielda are beginning to prooeed /with the cpndenaation of the 
voluminoua sociological data into concise ^mathematical rules. The fipal 
rational interpretation of such empirical rules cannot come until after 
the rules themselves are' established. 

The three stages in the advance oan«be summarized as: (1) the collection of* 
quantitative observations by Tycho Brahe, (2) their condensation into empirical f 
mathematical regularities by Kepler, and (3) theoretical interpretation of the 
latter by Newton (from Stewart 19U7b : 179 ) • 

In the investigation of communication between speech groups, not even the first 
stage is well advanced. Quantitative observations on lexioal cognate percentages 
between dialects all over the world are numerous, but quantitative observations on 
other aspects of linguistic relationships ( such as phonology, grammar, and 
semantics) are scant. Quantitative observations of intelligibility between ^dialects , 
are also rare, and observations on the social relations between dialects even more 
so. In Chapter *5 and Appendix 1 I gather all of r the quantitative q observations I 

/ could find in published and unpublished sources where both the percentage of 
intelligibility and the percentage of cognates are available. I could find such 
data from only ten language surveys around the world, a total of 245 observations. 
That is not enough data from which to derive universal laws, but it is enough to 

^demonstrate that there are mathematical regularities in the relationship between 
intelligibility and lexioal similarity. <■ 

r The thrust of the next two chapters is along the second stage of development, 
namely, the' condensation of observations into empiriQal mathematical regularities. 
si have encountered skeptics who feel that human relationships, such as communication 
between dialects, cannot be described in mathematical" terms, because human behavior 
involves too many unknowns and irregularities. I trust that the empirical studies 
in Chapters 5 and 6 are sufficient to show that mathematical description is 
feasible, that the regularities are strong, and that the remaining unknown^ play 
only a minor role. The third *stage, that of .interpreting the mathematical 
formulations and generalizing to universal laws, must, wait until more observations 
from all over- the world are , available. 

^Before proceeding to present my own work' in building models for explaining 
communication, I will report what others have d6ne previously. In. the first stage 
of model development, that' of collecting quantitative observations, I am aware of 
only the following investigators who have reported quantitative observations on both 
intelligibility and Atnguistio similarity: Marvin Bender and Robert Cooper (197D 
for Sidamo in Ethiopia, Bruce Biggs (1957) for Yuman in the United States, Eugene 
Casad, (1974:78-81, 191-2) for Trique in Mexico, David Glasgow and Richard Loving 
(1964) for ^he Maprik area in Papua New Guinea, Warren Harbeck and Raymond Gordon 
(Harbeok ms [1969]) for Siouan in the United States and Canada, Peter Ladejbged 
(1968, Ladefoged and others 1972) for Bantu in Uganda, and Gillian Sankoff (T968, 
1969) for Buang in Papua New Guinea. The data from all of the above studies, except 
from Glasgow and Loving, are. reproduced in Appendix 1. Glasgow and Loving m^de 
impressionistic judgments of "mutual" intelligibility rather than acutally. test 
intelligibility in both directions. All these investigators report lexioal cognati 
percentages as a measure of linguistic similarity. Bender arid Cooper (197D 
consider some grammatical relations as well, while Ladefoged (1968, 1970) quantifies 
phonological relations. Only three- of the Investigators — Casad, Ladefoged, and 
Sankoff — give any observations of relevant social factors; and only- Casad 



(1974:191-2) quantifies these. 
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Only two investigators have entered into the second stage of model development, 
that of oondensi'ng the observations into mathematical regularities. Ladefoged 
oomputed the best fitting linear model for explaining. his data and plotted it in a 
soattergram of the data points .( Ladefoged and others* 1972:76) . ■ Casod (1974:191-2) 
developed ^-linear model in three Variables to explain Intelligibility' relations 
among five Tri<|ue dialeots. The three variables, are lexioal similarity, intensity 
of contact, and location of contact. (The other two terms in his 'equation, the 
bilingualism faotor and the error faotor, are treated as oonstants.) The model" fits 
the data very olosely (97* explained vfriation, see Beotion M.4). However, the 
model does not fit well with theoretioal expectations. When there is no similarity 
and no oontaot, 6* Intelligibility is predioted. Where t*here is lOOj^similarity and 
no oontaot 76* intelligibility is predioted. Where there is no similarity and 
oomplete oontaot, 46* is predioted. When there is oomplete similarity and oomplete 
oontaot, 121* is predioted. Our theoretioal expectations .for these four boundary 
conditions would be 0*. 100*. 100* and 100* respectively. 

i 

The work of Bender and Cooper (197D should also be .mentioned in this respe'ot. 
Though they did not aotually build models, they did explore regularities in the 
relationships between intelligibility, lexical similarity, grammatical similarity, 
and geographio proximity by computing correlation ooeffioients (see Seotion 
The results showed that intelligibility correlated more highly with lexioal 
similarity and geographio proximity than with grammatical similarity. Grammatical 
similarity was measured * as the proportion 'of -grammatical morphemes shared in 
translations of the same text ( 1 97 1 : U2> . 

The third stage in model development, that of ^Interpreting the results of'stage 
two. and generalizing to universal laws, has not been reached. There are 
publications, 'however, in which general models for' explaining oommunioat ion 'have 
been suggested. The models are not backed up by empirical validation. and must 
therefore be viewed as exploratory. 

' . - . ** 

• * 

The most, elaborate of these is offered by Casad in an appendix' to his 'book, 
JU&lAfi*. Intelligibility laflUng' ( 197": 185-193) . ' In his model, five intapendenk 
variables underly intelligibility: (1) degree of- linguist^ similarity, (20 history 
o,f intragroup relations, (3) socioeoonomio relations, (4) alternatives for language 
use, and (51 relative size of the groups. Five dependent variables intervene' - 
.•between, the independent variables and intelligibility : (t) nature of intragroup 
o6ntact, {2\ societal attitudes., (3) language attitudes (U ) .type of v bilingualism, 
and (5) degree of bil ingualiso^ The mod*! is spebified in terms of' a direoted graph 
which charts the oause and^effeot "relations among the .ten va^-fables and 
intelligibility ( 197*: 186). Twenty-six axiomatio propositions implied by the model 
ar'e enumerated and sample theorems that-oan be derived?, from- the axioms-are given'.' 

Ken Collier" (1977) has proposed a simpler model.. He suggests that 
intelligibility is an^additive funot&rt of linguistic similarity and propensity to' 
learn.. The propensity to learn faotor is ,T a combination of two aspeots of sooial 
relations, contaot between . dialeots and the attitudes speakers of dialeots have 
toward the other dialeots. The paper includes suggestions on how the oontaot and 
attitude variables might be measured. Ronald Stolzfus (1974:^3, i»6) briefly 
suggests a similar model. He states that intelligibility results rrom the effeqt of 
linguistic similarity, or the effco^ of irftergroup language learning, or the sum of 
^>oth. - » . 

• Two models which have been suggested for a olosely related phenomenon/ language 
change, are also- relevant to the question of explaining communication. This is 
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because the same kinds of variables which explain communication and 'language 
learning also seem to explain the borrowing aspect of language change. Again,, these 
models are not backed up by empirical validation. £he first was offered by Olmsted 
(195Mb). His model predicts the likelihood that a single word will be understood 
and adopted by a single speaker. He suggests that JB likelihood is an increasing 
function of the following factors: the degree to wj^«rthe word is phonemically and 
morphemically regular in the hearer's system, the difference in social status of the 
speaker over the hearer, the upward social mobility of the hearer, the frequency of 
interaction between the speaker and hearer, and the frequency of occurrence of the 
word. He sums up the proposed model by saying that "the indispensables for lexical 
innovation are pronounceability and opportunity'^ ( 1954 ; 115). In analogy to the 
models for explaining communication, these two indispensables are similarity and 
contact. 
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Istvan Fodor (196'5) has written a monograph entitled The Rate of Linguistic 
Change in which he develops a model for explaining language change. He discusses 
six factors, involved in language change (pages 19-40); the historic effect, the 
cultural effect, the social effect, the geographic effect, the effect of neighboring 
and distant foreign peoples, and the role of national character. , In addition he 
discusses possible ways of measuring language change by quantitative methods^TplSge 
41-58) and a mathematical model of the rate of linguistic change (pages 59-73). 

4,3 A basic model for explaining communication 

Everyone who has tried to explain^ communication agrees on at least one thing, 
that two main factors play a key role in determining the presence or absence of 
communication: language variation and the social setting. On the one hand, the 
degree of' intelligibility between two dialects is related to linguistic similarity. 
The greater the similarity, the greater the intelligibility *' is likely to be; 
conversely, the' lower the similarity, the lower the intelligibility is likely to be. 
On the other hand, the degree of intelligibility is related to the social setting in 
which, the communication occurs. If the social' situation is favorable, contact and 
learning will lead to a boost in intelligibility. If the social situation is. not 
favorable, it will tend to limit intelligibility. Thus intelligibility can be 
viewed as comprised of two components: a linguistic,, or similarity-based, component 
and a SQC<al, or contact-based, component. That is, 

" total intelligibility * 

similarity-baaed intelligibility + 
K 'contact-based intelligibility 

This formulation with a simple addition is oversimplified; however, it, serves as a 
useful starting point for discussion. 




It is on the specifics of what factors go into each of the two components of 
intelligibility t how. these factors can be measured, and how the components interact, 
that investigators' have differed, <_Th|r discussions of the subject mentioned in 
Section 4.2 have been largely exploj^toly fnd based on little supporting evidence. 
The next two chapters cdnsidervbotft^#f these components and demonstrate with 
empirical evidencejiow they can be measured and built into models for explaining 
Qommunication, " 
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M.4 Some statistical preliminaries 

Formulating and testing mathematical models involves the use of statistics. In 
this section, the basic stat 1st iCa^gf erred to in" the next two chapters are briefly 
defined For a complete discussion^ these statistics and .hOw they are computed, 
the reader is referred to a -basHc text on statistics, such as Blalock 1972 
Darlington 1975, -or Downie and Heath 1974. . 



r„ IS* atandard statistical techniques of. least-squares regression and correlation 
fZrt '?? anal y aia in Cn *P ter 5. In a nutshell, these techniques are 

SI&m. ! * 1 ValuaS ° f ° ne variftl > 1 « Predict the values of another. The 

Ir JSlll ??w * predi ° ted 13 oallad the ^W4flnt XfiUtiHA, and a variable use<d as a 
-predictor. (there may be more tfcan one) is called an lndeoendant v.M.hia. ' th. 



-v.^u^or-vcnere may oe more tfcan one) is called an independent variable ' The 

112 l q T< a ""a ? aa l Mt t0 ViSUali2a if tha data ^.iKt a^SdiSensioni! 
graph. Figure gives an example (it is copied from Figure 5.B in Section 5 2.6) 

" igu "; the percentage of intelligibility is plotted on the vertical axis 
while the percentage of lexical similarity is plotted cm the horizontal axis. Each 
case in the data consists of a pair of observed values, the intelligibility from one 
dialect to another and the percentage of cognates they share. In the KraDh a dot is 
t 06 ? ^ the paired values of intelligibility artf lexical aimiiarUy intersect 
The plotted points are scattered within the graph ed for this reason such a graph 
is called a acattflrgram . Note, however; that the scattering is not random; there is 
a pattern. 

n« g n J!?™ a3iQ| V nalyala 18 US6d t0 fit a 6urve t0 fcnat Pattern. When performing 
regression analysis, one firs*t selects the desired shape of the curve; that is, 
whether it will be a straight line, a parabola, an exponential growth curve, and so 
on. The analysis then determines the parameters for the curve of that shape which 

??n!^ V eSOr - " the data " M0St 0f the regressions performed in Chapter 5 are 
linear. Linear regression finds a single straight line which best fits the pattern 

tL lYT* P °i nta ' In d0lng S0 ' U finda tne straight line passing through 

the data points in such a way. that the average square of the distance of *he data 

l tn« pf" Pm ^ at ,^ ine 13 the leaSt P° aaible « This line is called the regression 
line. Figure 4.1 illustrates the best fitting linear regression line for the given 
data points. It can also be thought of as a prediction line; the predicted value of 
the dependent variable can be read from the intersection of the regression line with 
the given value of the independent variable. ' o 

• Corre l at i on analysis measures the amount of scatter about the regression line 
it is therefore used to assess the goodness of fit of the- line and the model it 
represents. The correlation coefficient used in this analysis is the Pearson 
product-moment correlation coefficient, symbolized with r . The absolute value ,of 
this coefficient ranges from zero for no correlation to one when -ill of the data 
points lie exactly on a straight line. Thus, when the points cluster close to the 
regression line, the correlation coefficient approaches one. When the points are 
scattered far from the line, the correlation coefficient approaches zero, * 

nn-H^?!" ^ ^relation coefficient is less than one,' it is an ^indication that' 
predictions of the dependent variable made with the regression- line are not perfect. 
The stand a rd Arxflr; Si eatlnaU measures the amount , of prediction error associated 
with the prediction*. It is used to compute an interval estimate for predictions 
For 'instance, to say that predicted intelligibility is °80< is to use a point 
estimate; to say that it is between 70* and 90$ is to use an interval estimate. 
When the standard error of estimate is used to compute an interval estimate, the 
interval is characterized by a- CQnfldcnOft layjfil. The standard error of estimate 
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itself defines a 68* confidence interval. This means that in 68* or the oases the 
-true value of the dependent variable la within a range of plus or minus one standard 
error of estimate from the predicted value. Doubling the standard error of estimate 
defines a 95* confidence interval. within whloh the true value can be expeoted to lie 
19 times out of 20. For Instance, if the pre'di'oted value o-f intelligibility Is 70* 
and the standard, erro r of .estimate Is 8*, then we can say that the true value of 
intelligibility is within the range of 5H* to 86* with 95* oonfidenoe. The 
multiplicative constants for defining other levels of oonfldenje oan be found by 
consulting any statistics text book. * 

The algniflfiatifift of the correlation coefficient offers a means of evaluating 
the degree of oonfidenoe In the strength of the relationship between two variables. 
It tells us how muoh trust we can put in the oorrelation coefficient and the 
regression line. . It is possible that two variables oould be totally unrelated but 
that the chance distribution .of , the twp randomly related variables would yield a 
high correlation coefficient. As the number' of data points increases, the 
likelihood of a spurious correlation deoreases. • The signifioanoe of the oorrelation 
coefficient is computed as .a probability. It is the probability that the value of a 
correlation coefficient as large or larger than the one calculated could have arisen 
by chance alone, were the two variables in faot uncorrelated . For instance, a 
significance level of .001 means there is a one In a thousand ohance that 'the 
observed relationship between the variables oould be due to ohance alone. In the 
social sciences, a significance level of ,05 or less is generally considered to be 
significant. A significance level of .05 is the same as a confidence level x>f 95*. 

A final statistic for evaluating the strength of relationship between two 
variables is the percentage ^ explained variation . In the data of the next 
chapter', measured lntelliglbilty varies from 0* to 100*. At the same time lexical 
similarity varies from 0* to 100*. In doing the statistical tests described above, 
we are asking, "Can. the variation in measured lntelliglbilty be explained by the 
variation in lexical similarity?" That is, when lexioal, similarity goes up, does 
intelllgibity also go tfp, and by a proportional amount? By the same token, whejn 
.... lexical similarity goes down/ does intelligibility also *o down, and by a 
proportional amount? The percentage of explained variation answers these questions 
directly. The percentage - of explained variation tells how muoh of the measured 
variation in intelligibility is explained by the variation in lexioal similarity, 
or, what percentage of the ups and downs in intelligibility correspond to ups and 
downs in lexioal similarity. ■ . . - 

In evaluating the adequacy ot explaining (or predicting) tabdela/ the total 
variation' in the dependent (or predicted) variable Is partitioned into ' two 
. components, the explained variation 'and the unexplained variation, .That is, 

total variation « , 

explained variation t , , 

unexplained variation 

, In-the statistical analyses , which follow, the total variation in\the dependent 
•variable is measured by its sum of squares —the sum of the squared differences 
between the actual values »f the dependent variable and itV-me'an value. (When the 
» sum ..of squares is divided by the number df oases-, the result is a statistio called 
.* the' varlanoj r Thus the percentage of explained .variation 1 -I am using is equivalent 
( to the percentage of explained variance.) fhe explained variation is measured by 
the regression sum of squares —the sum of the squared differences between the 
predicted values ', and^he mean value, -The unexplained variationals measured by the 
. % ' ■* fc « ' ,'■ ' 

' - . . . 
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residual sum of squares — the sum of the squared differences between the predicted 
values and their corresponding actual values. 

The percentage of explained variation is computed by dividing the explained 
variation by the total variation and multiplying the result by 100. When the 
correlation coefficient is squared, the result is the proportion of explained 
variation. Thus another way to compute the percentage of explained variation is to 
square the correlation ot>effioient and multiply by 1X)0. The percentage of 
unexplained variation can be computed by subtracting the percentage of explained 
variation from 100$. /) 

For the problem of explaining intelligibility as a function of lexical 
similarity, the .partitioning of variation is as follows: 

total variation in intelligibility s 

variation explained by lexical similarity + 
unexplained variation 

If there were no unexplained variation, then the model would be complete. Variation 
in lexical similarity would explain all of the variation in intelligibility, and we 
would ?ay that lexioal similarity is a perfect predictor of intelligibility. 
However, when the percentage of explained variation is less than 100$ then lexioal 
similarity is not a perfect predictor of intelligibility and the" model is 
incomplete: A model Is complete only if -it cqn account for all the total variation. 
To complete the model, we must introduce additional factors to explain * the 
unexplained variation. If the unexplained variation is small it can be- attributed 
to measurement error, either in test construction and scoring, in sampling, or in 
both. When the unexplained variation is greater, however, measurement error alone 
can no longer be used to account for the unexplained variation. At this point it is 
necessary to introduoe other faotors into the predicting model, such as social 
factors or other aspects of linguistio similarity, or to ohang* the mathematical 
relations in th^ m<?del, such, as from linear to exponential. 

In the next two chapters, the attempt is made to explain communication. The 
approaoh is one of successive refinements. In each chapter,, a succession of models 
is considered. At eaoh step refinements are made by Incorporating new or different 
factors or different mathematical relations into the model in order to account for d 
portion of the previously unexplained variation, an<J thus, increase the percent-age of 
explained variation/. 




CHAPTER 5 

i 

EXPLAINING COMMUNICATION: LINGUISTIC FACTORS 

This chapter considers how tha linguiatio similarity between dial eot a affeota 
tha intalligibility betwean them. In Section 5*1 tha diaousaion oovers tha ganaral 
problems of quantifying linguiatio aimilarityjio that it oan be inoorporatad into a 
mathematical model, in Saotion 5.2, an empiHToal analyaia of the relation between 
lexical similarity end intelligibility la. made. Thia analysis ia based on data 
gathered in ten different field studies throughout the world. As a final 
conclusion, * the possible universal relationat/ip between lexical similarity and 
intelligibility suggested by the oonburring seta W field data is explored.^ 

i / 

i 

5.1 Quantifying linguistic similarity 

The approach of modeling by numerical equation requires . that we desoriba 
i linguistic similarity numerically. However, linguiatio' similarity is not an easy 
<> concept to quantify. Languages may differ in their sound systems ► their 
vocabularies, their grammars,- or their semantiq systems. Beoause linguistic 
similarity is such a complex relationship, it is impossible to " summarize it 
completely in one number, at least at the present time. This is one of the motives 
^. behind the early studies of intelligibility. They- hoped by testing intelligibility 
to discover a means of indirectly .quantifying linguistic similarity, or "dialect 
distanoe" as they called it (Pierce 1952,',Biggs 1957). However, their perspeotive 
was backwards (Wolff 1959). [intelligibility does npt determine linguistic 
similarity; .rather, Unguistio Similarity along with other factors determines 
intelligibility. Thus the burden falls baok on finding a means to quantify 
linguistic similarity directly. 

Many teohniquea have been proposed for quantifying specific aspects of 
lingusitio similarity. The most widely used is lexioostatistios, which measures the 
degree of similarity in - basic vooabulary" between languages. The method was 
developed by Morris Swadesh 095p\ 1952, 1955, also Lees 1953). Helpful diaouaaions 
*re given by Gleaaon (1959), Gifflaohinaky (1956), Hymea (196O), and Sandera (A. 
Sanders 1977a) . ~ ^ 

•* . ■ -v - * - I 1 

A y number of methods for quantifying phonologioal similarity, or 
phonoatj^tistios, have been proposed. However, none haa gained the widespread use 
and acceptance that lexiooata*tiatioXjiaa. Thia ia probably because the development 
of phonoatatiatioa was nearly ten yeara later and -beoause phonostatistlbs is 
computationally more oomplex. The moat promising methods have been developed by 
Grimes and Agard (-1959), MoKaughan '( 1 968 ) , .and Ladefoged ( 1970; Ladefoged and others 
1972:62-65). Elaewhere I give a review of these" and nine * other phonoatatistio 
methods (Simons 1977c). 

A fewf attempta at quantifying grammatical similarity have been made but with 
limited auooeaa. Again, these methoda have not enjo'yed a. widespread use or 
acoaptanca,. In genera;, thaaa grammatical methoda require a go,od analysis and 
understanding of the grammara whioh are being compared. For thia reason they are 
» ■ 

ERIC . ■ a ' • 
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not applicable to yie language survey situation, unless the investigator has a very 
good idea of what the grammars will be like on the basis of comparative study.* 
Methods of grammatical statistics have fallen £nto two major categories.. The first 
computes measures cf association between dialects by comparing thero for the presence 
or absence of key morphological or syntactic features (Kroeber and .Chretien 1^37, 
1939. EUegfird 1959* Simons 1977c: U2-3, see also Capfell 1962). The second computes 
typological indices which characterize single dialects as to their pbsitipn along 
sctoe dimension of language structuring. For instance, - an "Index of syrithesis" 
measures the average number of morphemes per word. Comparisons betwtffen dialects are 
achieved by comparing their indices (Qreenberg 1960, Kroeber 1960, Voegelin and 
others 1960, Voegelin t96l, Moore 1961). Bender and Cooper (197D used a third 
method which resembles lexicostatistics more than either of the above typological 
methods. Their intelligibility tests wer$ based on six texts that were translated 
into each of the six dialects they were testing (see Section 2.2.4). They were thu5 
able to make morpheme by morpheme comparisons of the translated texts and compute 
the percentage of grammatical morpheme^ (as opposed to root morphemes) which were 
the same for each pair of dialects. These measures of grammatical association were 
then correlated, with measured f intelligibility; thg ^results were lapgely 
inconclusive 

Quantifications of semantic similarity have not yet been used by linguists to 
my knowledge. Such a method could follow the 'first method described above^ for 
grammatical statistics. Each pair^>f dialects would be compared for the presence or 
absence of key semantic oppositions. The work of Berlin and Kay C 1969 ) on color 
terms contains the information and analysis necessary to quantitatively cpmpare 98 
languages of the world on the semantics of their color terminology. Furthermore, 
<-their work develops a methodology which could be applied for the remaining 
languages. Other semantic domains which have been well studied are kinship 
terminology and. body 4 part terminology. Another* possible .approach is Charles 
Osgood's semantic differential technique, which is a method for quantifying and 
comparing meaning (Osgood and others 1957, Snider and Osgood 1969). 

At the present time, the prospects for a composite quantification of linguistic 
similarity are not good. A number of phonostatistic methods exist, but none has 
been widely used, mainly because the computations afe complex. Good techniques for 
gathering and quantifying data on grammatical and semantic similarity, at least in 
the dialect survey situation, are still in the future. 

Lexicostatistics remains as the most widespread and readily available" mp.ans for 
quantifying linguistic similarity. The. analysis in the next section of this 
chapter, especially Section 5.2.5, demonstrates that lexical similarity is a good 
predictor of intelligibility and thus must be viewed as a useful approximation to a 
measure of linguistic similarity. Nevertheless, many investigators have avoided or 
belittled the- use of lexicostatistics. There are at least three reasons for this. 

First, the pitfalls of glottochronology with its assumptions of a universal 
rate of change and the requirement of independent change, and the ensuing misuse of 
lexicostatistics in studies of linguistic * history, have tainted the image of 
lexicostatistics. However, if we take lexicostatistics at >face value for merely 
what it is, "word statistics", it is free from thes^* assumptions and problems. 
Under these conditions it actually proves to be an effective predictor of 
intelligibility. That is, similarity of basic vocabulary is a more reliable 
indicator of intelligibility between languages than it is of the historical time 
depth between languages. Elsewhere (Simons 1977ds 1 M-17 ) I have contrasted the 
methods of synchronic lexicostatistics and diachronio lexicostatistics and shown 
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that the future of the synchronic use of it ia bright while that of the diaohronio 
use ia not. 

Second, lexical similarity ia only one aapeot of linguistic aimllarity. Some 
investigators have thus been leery of depending on it to estimate linguistic 
similarity. However, the results in the next section indicate that lexical 
similarity alone is a good predictor of intelligibility, and therefore approximates 
linguistic similarity as well. The results do not suggest that phonological, 
grammatical, and semantic similarity are not important, but simply that degree of 
lexical similarity parallels the degree of phonological, grammatical, and .'semantio 
similarity. This would imply that ohange in these other aspeots of language tends 
to keep abreast of ohange in vocabulary. This is not always the case, but it 
probably averages out. For instance, Grimes (197*: 267) has shown that French and 
Catalan group more closely with Spanish and Portuguese than wittt Italian on the 
basis of phonostatistios (Grimes and Agard 1959, Grimes 19*64) but they group more 
f i-f 6 L Italian on the basis of lexioostatistlos (Rea 1958). The reason is 
tnat the one measure is sensitive to a heavy lexical borrowing in Frenoh from 
Italian around the Renaissance period, while the other measures sound ohange 
However, for the rest of Romance, the two groupings agree. 

Finally, it has been suggested that lexicostatistic measures are not as 
appropriate as phonostatistio measures in assessing linguistic similarity for 
??^f! S ° f nanguage divergence where intelligibility is still expected. McKaughan 
U90H), in an analysis of linguistic relations among a number of dialeots in the New 
Guinea highland!, used three methods: lexioostatistlos, phonostatistios, and 
structural comparison. In oonolusion he suggested that each method was most useful 
within certain ranges of linguistic aivergence: phonoStatisic methods are most 
applicable where there is slight divergence, lexicostatistic methods where there is 
moderate divergence, and structural comparisons where there is wide divergence 
(McKaughan 1*74:118). Ladefoged (1968:5, Casad 1974:118-9) has suggested that since 
we expect intelligibility only between highly similar dialeots, phonostatistio 
methods may be more useful than lexicostatistic or grammatical methods in predicting 
intelligibility. On the basis of these suggestions and possibly the "other two 

WcoSt a t?;?f^ ne ? ^ read J" Ca3ad ^974:118-119) does not even consider 
lexicostatistios in his chapter , on alternative , approaches for assessing 
lntelligibilty . • 

Th.„ I hC -w re3U J tS in Seotion 5.2 do not prove or disprove MoKaughan's hypothesis. 
They do show, however, that any assumption that lexicostatistic measures are not 

""f^' e " ou ** witnin tn « ran «« of linguistic divergence apropriate to the range 
of intelligibility is ill founded. 8 

• - • / 
5.2 Lexioal similarity and intelligibility, 
5.2.1 Overview of the data and method 

■ This study of the relationship between lexical similarity and intelligibility 
is jased on ten field studies oonducted in various parts of the world. These' 
studies were oonducted by ten investigators in ten different language groups. The 
groups span three continents — Africa, Oceania, and North America. The specif io 
areas involved are Ethiopia, Uganda, Papua New Guinea, the Polynesian islands, 
Mexloo, Canada, and the United States. Not only were the oiroumstanoes of eacfr of 
the studies different; so were the methodologies. m spite of all these 
differences* the degree of convergence between the results of all these field 
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studies is very striking. Section 5.2.4 shows that eight of these studies point to 
almost exactly the iaroe underlying relationship between lexical similarity arid 
Intelligibility. 

In each field study the percentage of Intelligibility between dialects In the 
atijdy area was measured. Corresponding to eyaoh measurement of Intelligibility is a 
measure of the lexical similarity between the same two dialects. These measures are 
expressed as a cognate percentage,. Eaoh pair of measurements, an intelligibility 
percentage with a cognate percentage, is treated as one oase In the statistical 
analysis. The smallest study contains oply nine oases, while the largest study 
contains seventy-seven. The average size Is twenty-four oases. The complete 
details about eaoh study, including tftfe souroes of the data, some notes on the 
methodologies used, and a listing of the raw data are found In Appendix 1. 

The analysis begins in Section 5.2,2 by examining the results obtained from the 
raw data. In Section 5.2.3 the analysis is refined by removing some of the effects 
of social factors from the predicting model. In Section 5.2.4 the prediction of 
intelligibility by lexical similarity Is further sharpened by adjusting the 
intelligibility scores foY measurement error. Final * conclusions are drawn in 
Section 5.2.5. In Section 5.2.6 thfe data frop the different field studies are 
pooled and possible models for the universal relationship between Intelligibility 
and lexical similarity are explored. 

i 

Except for Section 5.2.6, the method of linear regression Is used throughout 
the analysis 'to find the relationship between similarity and intelligibility. This 
makes the assumption that the relationship between the two variables is a linear, or 
straight line, one. A straight line plot says that a given amount of increase In 
lexical similarity will give the same increase In intelligibility at any point along 
the intelligibility scale. There Is no theoretical reason why we should expect this 
to be the actual oase. For Instanqe, the factor of redundancy would suggest that an 
increase in similarity would have less and less of an effect on Intelligibility as 
the Intelligibility neared 100J. However, the soattergrams In Appendix 1.3 show no 
consistent hint of nonllnearity . Thus linear techniques were used In the analysis 
since they . are computationally the simplest.* The assumption of linearity Is a 
weaker one than the assumption of nonllnearity and is thus appropriate for a first 
approximation. The use of nonlinear techniques should increase, not decrease, the 
degree of fit of the modftls. In Section 5.2.6, the data from eight studies are 
pooled and nonlinear relationships are explored. Nonlinear models turn out to offer 
a slight, but not statistically significant, Improvement oter the linear model for 
the current data.. t % 



5.2.2 Results from the raw data 

The data have been briefly described already In Section 5.2.1. In Appendix 
1.1, eaoh of the ten sets <5f data is described more fully. In Appendix 1.2, all of 
the data Is listed. .In Appendix 1.3, a soattergram showing the distribution of 
intelligibility versus lexical similarity Is plotted for each of the* ten studies. 
Below eaoh soattergram, the following figure* (see Section 4.4) are listed: the 
number of oases, -the correlation coefficient, the significance, the standard error 
of estimate, *nd the percentage of explained variation. In addition, the line of 
best fit given by the regression analyst^ on the full data Is drawn Into the 
soattergram as a solid line. >The formula for this Tine 13} given at the base of the 
soattergram. 
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From the formula for the regression line, it ia possible to compute two other 
helpful -quantities. • The first is the predicted value of intelligibility when 
lexioal .similarity is 100*; the second is the value of lexioal similarity when the 
predicted value of intelligibility is 0*. The first quantity, the predicted value 
of intelligibility for 100* lexical similarity, gives a measure of naturalness for 
the prediction equation.. The regression line should predict 100$ intelligibility 
when lexioal similarity is- also 100*. The nearness of the predicted value to 100* 
gives a measure of naturalness for the prediction equation. , The second quantity, 
the value .of lexioal similarity for a predicted intelligibility of 0*, offers a 
means of comparing the convergence of the ten different studies. Ideally, at the 
upper end of the regression line, the lines for all ten studies should converge pn 
the point at (100*, 100*). At the lower end, however, where the lines interseot the 
similarity axis, the lines fan out indicating the differences between studies.. The 
points at which the predicting lines interseot the similarity axis give a good means 
of oomparing the degree to which the regression lines from the different studies are 
the same cr different. 

In Figure 5.1 the regression lines from the ten different soattergrams in 
Appendix 1,3 are superimposed on the same graph. Note that all ten studies show the 
same general trend, a regression line which starts in the lower left and rises to 
the upper right. There is a general convergence toward the (100*, 100*) point; 
however, it is not very strong. The predicted values of Intelligibility for 100* 
lexical similarity range from 68* to 102*, This explains most of the crisscrossing 
of the prediction lines. 

In Figure 5.2 the key statistics from Appendix 1.3 are oompiled into a summary 
table. For each of the ten studies, the following figures are given: the number of 
cases (N) , the percentage of explained variation (*EV) , the correlation coefficient 
(Corr), the significance of the correlation coefficient (Sig), the standard error of 
estimate (SEE), the predicted intelligibility for 100* lexical similarity (Lex-100), 
and the lexical similarity for 0* predicted intelligibility (Int-0) . In the top 
portion of the table the figures for each, of the ten studies are given; in the 
bottom portion they are summarized. Four figures are given in the summary: the 
minimum observed value, the maximum observed value, the mean (or the average) of the 
ten observed values, and the standard deviation from the mean. The standard 
deviation is a measure of dispersal around the mean. Roughly speaking, it tells. the 
average amount by which the observed values differ from the mean. $ 

The data can be summarized as follows. The ten studies contain, on average, 21 
cases. The percentage of explained variation ranges from 18* to 97* with an average 
of 65*. The average correlation coefficient is .79110. In only one study, Biliau, 
is the significance doubtful; in all other cases the probability of a spurious 
correlation is less than one in a thousand. The average standard error of estimate 
for predictions of intelligibility is 13*- The predicted values of intelligibility 
for 100* lexical similarity range from 69* to 102*, with an average of 90*. The 
standard deviation of 811 for the points at which the regression lines cross the 
similarity axis, gives an indication of how scattered the prediction lines based on 
raw data are. 



5.2.3 Controlling for npnsyrometric social factors 

In the average case, lexioal similarity alone accounts for 65* of the variation 
in. raw intelligibility scores*. 35* of the variation in intelligibility remains 
unexplained, in this section, almost one half of this unexplained variation is 
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Figure 5.2 Statistics for full raw data 
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attributed to social factors, specifically to the effects of nonaymmetrio social 
relation* whioh oan be observed in the intelligibility data. 

A basio model for explaining intelligiblity has already been introduced in 
Section 4,3. There tt was suggested that intelligibility has two components, a 
linguistic similarity-based component and a social oontaot-b^sed component. In 
terms of a partitioning of variation this model oan be expressed as, 

total variation in intelligibility a 

variation explained by linguistic factors + 
variation explained by soMal faotqj^ 

In the previous section, we investigated only the contribution of linguistic 
similarity (specifically, lexical similarity) to explaining intelligibility. It 
then follows from the preceding formula that the variation due to social ^factors is 
as yet a component of the unexplained variation. 

The data do not include measurements, of . relevant social factors; therefore, it 
is not possible to do a full investigation of the contribution of social factors. 
However, there is one property of intelligibility whioh points to the/ presence of 
social factors and that is nonsyrametry. Dialect A may understand B better than B 
understands A, or vice versa. According to our basio model this must be explained 
by the presence of nonsymmetrio relations Qf linguistic similarity or nonsymmetrib 
social relations. Lexical * similarity, our current approximation to linguistic 
similarity, is a symmetric measure. That is, the percentage of cognates from B to A 
is always the same as that from A to B. If there are, any nonsymmetrio linguistic 
factory these also would appear in the model in the unexplained category. 

There are there/ofe two possible hypotheses: that nonsymmetrio intelligibility 
relations are explained by nonsymmetrio " linguistic relations' or by nonsymmetrio 
social relations. I am assuming in these data that they are due to nonsymmetrio 
social relations. The sources do provide some evidence for this, while they provide 
no evidence for the alternative hypothesis that nonsymmetrio intelligibility is 
explained by nonsymmetrio linguistic relations. Of r the ten studies, only Sankoff 
address the latter possibility but, concludes that there is no basis for accepting 
the hypothesis. She observes that ffor the Buang data, explaining f, non- reciprocal 
intelligibility . .#T bn the basiis of phonetic differences between the codes gives 
equivocal results!? / (1969:847, 1968:183). Other writers, for instance Wurm and 
Laycock (1961:129-132) and St. Clair ( 197^:93-5, 197Mb: 146-7)., have attempted to 
explain honreoiprocal intelligibility in terms of asymmetric linguistic relations, 
but their evidence is impressionistic rather than empirical. While I do not deny 
that linguistic relations contribute to nonreoiprooal intelligibility, the evidence 
.which demonstrates the extent to whioh they do is 1 presently lacking. 

* 

The souroes do give evidence for non symmetric intelligibility caused by' 
nonsymmetrio social relations* For the Biliau data, whioh were collected by my wif£ 
And myself, intelligibility relations in the direction of Biliau village are greater 
Vian those directed away from Biliau. Thia is because that villager is the political 
aha economic center for thtf region. At Biliau are located an airstrip, a harbor, a 
primary school, *a mediqal clinic,' and a' mission station. For the fcuang data, 
Sankoff ( 1969:847) riotes :\ that -the nonsymnjetric intelligibility is explained by 
contact arising from travel routes dowtf the river valley' - toward the government 
station. For the Ugandan data, Lad*foged, Oliok, and Criper ( 1972:76)^observe that 
in the one case of rtonsymtnetric intelligibility, the better understood -dialect "is 
spoken in the capital. of the country, and has more time on the radio than anjr other- 
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Ugandan language." 

Th* presence of these nonsymmetrio sooial relations shows up in v the 
intelligibility relations aa significantly different scores for. oommunioation in 
both directions between the same two dialeots. In such oases we assume that the 
higher intellligibility score of the pair is boosted by nonsyrametric sooial 
relations (that- is, boosted by oontaot and learning). By removing cases where' this 
boosting is deteoted, it is possible po control for the contribution tha,t 
nonsymmetric social relations make to explaining intelligibility. 



Nevertheless, there still remain cases where a sooial- relation that is 
symmetric oan boost intelligibility in both directions and go undetected by this 
method. A good example is the Biliau data. In that study the twd most divergent 
dialeots are only three hours' walking distance away and there is a lot of oontaot 
in both directions. These oases must, be relegated to the category of unexplained 
variation. 



now, 



more complete model for the decomposition of variation in intelligibility is 
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'* total variation in intelligibility = 

variation explained by lexical similarity + 
variation explained by nonsymmetrio faotors + 
unexplained variation 

\ - X 

where unexplained variation includes nonlexical aspects of linguistio similarity, 
symmetric social relations, and measurement error. 

This model suggests that if the effeots of nonsymmetrio sooial faotors oan be 
controlled, then unexplained variation will deorease. This hypothesis oan be tested 
with the data from the ten field studies. The method used is to remove oases from 
the sample in which a boosting of intelligibility due to nonsymmetrio sooial faotors 
is suspected, and then to repeat the«correlation and regression analysis. Suoh 
cases were found by inspecting the data. First the symmetric pairs pf cases were 
found, A symmetrio pair of cases. is two oases whioh. measure oommunioation in both 
directions between the same two dialects. If one of the intelligibility soores in 
the symmetrio pair is significantly higher than the other, then that case is dropped 
from the sample. To Judge a significant difference, it was not possible to make 
tests of significance sinoe the reported data 'do not oontain stajndard devitftjons for 
the intelligibil-Uy measurements. Instead a simple rule of thumb was used: if one 
-score was .101 . ,6c * more greater than the other 'then it. was considered "to be 
significantly hi'ghe>; The cases thus removed from the sample are indicated in 
Appendix 1.2 by an IX" in^the "Excluded" column. In the soattergrama in Appendix 
1.3, the excluded points are plotted as "X," while the remaining points are piotted 
as circles. Examination of the aoattergrams shows tjhat the exoluded points, in 
general, lie well above the regression line. Thejre are points, however, whioh are 
further from the regression line ' than the excludedjpoints. '.These are probably 
examples of undetected symmetric social factors which boAst intelligibllityV" 

In- the soattergrams in Appendix 1.3, a seoond regression line is drawn in as 
dashed line.. This is the regression line for only thosej points plotted as circles. 
Below the scafetergrams two sets of statistical oomputatiohs are given. The first 
set is for all of the data points; the second set is jfor the circle points only, 
the points whioh remain when the "X«" points, are exoluded. J The statistics computed 
for the data with exclusions are compiled into a summary table in Figure 5.l», f The 



-format of this table is the same as that 6f Figure 5.2 explained previously. In' 
Figure 5.3. regression lines for the teri , studies with the H X M points' excluded are 
superimposed igi one graph. This graph parallels Figure 5.1.' 

The effect on the results of oontrolling v for rfonsymmetric social factors can be 
seen by comparing Figure 5 it with Figure 5.2. •. The average, number of cases Is 
reduced from. 24 to 20; thus, on. average, four oases" were, remdved from - each study. 
The change in percentage of explained variation is substantial; it rises^from 65% to 
81*. This 16* additional explained variation, Supports the original' hypothesis' that 
nonsyrametric social factors are an important element in explaining intelligibility. 
The other , measures of predicting accuracy 1 and reliability show comparable 
.improvements: the correlation coefficients' increase * by ** 10564 ort average, the 
average significance improves nearly ten times, 'and the standird. error of' estimate 
decrease from 13% to* 10*,"' There is no significant changeyi* the average predioted 
value of intelligibility for complete lexical similarity,; it la still below 90* with 
a standard deviation exceeding 10*. There is, .however, an improvement' in the" degree 
to which the prediction lines of the different studies fan out; x thisfis seen by the 
decrease, from 84* to . 40 in the standard- deviation of the^point'at whicTi the lines 
cross the similarity axis. - 'i**-* : * , ■ 



5.2.4 Controlling for intelligibility measurement; error •' '* 

' « . . < - ■•*■■.>'. - »'* . ' u , ' 

Notl'all r S»pectslL jaf measurement error need '/be* classified as unexplained 
variatiofc. ' .One aapefct of measurement errbr in ihtelligibility soones caui be 
adJusted>or. In the, administration of intel^igibillty^te'sts , subjects seldb , m,.gat'a 
perfect result on the t«st from their own dialect. * However,' .they- "theoretically 
should understand ( their own form of speech 'Verfecjtly, < When test results- indicate 
ttiat they do not, these results are best interpreted as pointing to deficiencies in 
the abilities of -4jie .subject , * in *. the construction, of the^test, or in' the 1 
administration of the $ea*t. It is possible^ t*o^' controjL for, these kinds of 
measurement ; errors by adjusting' 'raw intelligibility scares ' on' the basis of 
performance on the hbmetojwn -,te£t , Tfle kfinds of measurement errors 'which still lie, 
'beyond the preach of such adjustments are sampling *eJ^o?s; ,tha*t is',, those whidh have* 
to do with how well thegroUp of 'subjects represents- tti& Whole .community and -how 
well the text represents the language .as a Whole.- .<r v ■ V V ■ . J ■ '. 

*" . •• * ; • * " . • -. ' 

Th'e need for hometown score adjustments in the jtaWpf the. < ten field studies is 
seen^ in Figure 5 .5. •. Th/Ls table shows- the distribution, of hometown scores for^each 
of thetern- field studies./ The first «; column gives the* lowest, measured - hometown 
scorel* the second, column>.,gives^ the highest hometown- »cor* £*hd the £hird oo.^ann 

,lists y»e,' average hometown score, .'ijote'tnat in the'oaae "Of-; the Buang study. thV 
averafee , hometown soore is Vly'69*. . Taken 'at fapa va^e, th}.a suggests: that . the * 
Buang ViUagar oan understand" ohjjf 69* of what hi s\ neighbor ; s^ys to ^bifo. .This 
otfviouAlyJ - is Jiot fcrue'.^ Qn the other hand, in/the jrique study the average hometoWrt 
score is as high/aa 98*. The, last row of Figure 5, '5 shows that/o vera H v the hometown.' 
scores range, from, a- low of lit*' to a high of. 1.00* w*Hh an average -"home town .wore o.f 
90*. , The ." wid> 'difference average hometown soores between, individuar'studies 

'accounts, for *t)Te scattering of the regression' lines .in Figfures 5t* and . 5.3 i^ the 
top right hand, corner^ of the. graphs, jHeoretioaJly,, jall« "-the ' lines show-Id 'converge 
on the point;.(,10p*,-.1G0*l7;*Howeyer, # because the. aVerage hometown soOre" variesv/f rom 
69* £0 .98*, so, do tne predicted value's of" i'nttelligiblllty^wHen lexical similarity'' is 

t;/ AtiJuatiYig- the raw in^ldfgibllity soores Atvatfch a. way that hometown acoree'.are 
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Figure 5.4 Statistics for raw intelligibility 
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Figure 5.5 Distribution of hometown aoorea 
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rtia * d t& . 10 °* h * 3 tw ? •tTfoti'. First,. by oompenaating for measurement error 
increases the amount of unexplained variation in the' model. Second, and most 
imDbrtant,,^it makes the results qf different field studies more oomparable. When 
^^ 10a, ^ $> l? n soor « a> ^\ t ^ Buang study average $91, while the hometown 'soores in the 

d%3i"! T/J! Pag y 98 K U 19 V * ry dir * lou H *> °°«P«™ the . two studies to 
I s " 7 f UBg *' fc * °°n ,no n trend. However, when all of the intelligibility 
so6rea are adjusted to raise the hometown, scores to 100*, the results* of all the 
dJ/fferent studies are put on the same jscale of measurement. They o an then be 
compared directly to one another and the oases from the different studies dan even 
b* Joined into one large set of data. ( The effect in the plot df regression lines 
(Ahown in Figure 5.6) is that the lines obnverge much more sharply toward the (100*. 
j00|) point. •'.<-.. » ' . «s 

The discrepancy between the' hoaetjbwn score' and; 100$ oan be attributed to one of 
three things: a learning ourve,. .the kubjeot'a abilities,' or test deficiencies. 
Depending upon the souroe of the discrepancy; three different methods can/be used 
to. adjust the intelligibility soores to Wmalite the hometown scores to 100*. The 



> three sources 
.J follows: 



of discrepancy and the taethods used to compensate for them are as 



V 



>hould be- the first test whioh.a subject 
st jwithout having to .contend with 
;n spite, of efforts to explain how the 



' (D Learning ourve - The hometown teat 
takes. This £a so he can learn! to take the 

dialect differences at the. same . time. _.. _ _ ^ mA „ ijw 

testing will be done and of even having, a preliminary warm-up test ^ it"oould be th.e 
case that the aubjeot was < still learning how to take the teat when he took the 
hometown teat. Thia could rfsultin errors on the hometown teat. We may be Able to 
asaume that these errora affect bnly the hometown teat and" by the time .the aubjeot 
get* to thia seoond teat there will be no more auch errora. The eolation for 

?«nJ* fc u?f 1 ? ttliiglblllty acort8 ' ln thl3 oaa * la to ra^Lae *ll-the hometown acores to 
i<?0* While leaving the remaining .ecorea^uncbanged..'*. jThia method of adjusting ia 
particularly appropriate when hometown icorea' are very nearly, 100*. Cased (1974:32) 
A4a suggested - that as reau^ta^rom intelligibility, teating become so reliable that 
hometown aoorea do-approaoh 100*, tnde kind of adjuatment ia moat appropriate. ' 
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N (2)' Sybject abilities It could .be that the subject was .forgetful or 
unintelligent or uncomfortable in'ttae testing situation. If this. were the case we 
would expect -thes^e 4c inds of factors to effect not Just the hometown, test, "but All 
testa which that subject toqjc. -r The solution then wpuld be to adjust ( all 'of a 
/subject's (or group of subje«?tf**) : scopes on the basis of th© score received , on the 
hometown tVst. That is f the hometown score will be raised to. 100J -and all other 
scores will be raised in a compaf*able manner. The r&tipn^le behind such an 
adjustment- is- that no subjects should be expeated to do better on an intelligibility 
test than they di4 on their own hometown test. - * 

(3) Test deficiencies - It could be that the text on which' the test was baaed 
was difficult in subject matter; that the' recprdirig waa of a poor quality, that 
questions, were improperly phrased, or that the text was segmented in inappropriate 

^.spota. If this; were^the case; thVdef icienoies in the test would . affect not only 
the hometown scores for that test, but als-o all the scores for that teat. xThe 
splution then would be tfc adjuat all the scores' obtained on a. particular ^test on the 
basts of the score, obtained \>y the hometown dialect. The rationale here is~ thftt no 

.sub'Jeot^ should be expected to do .better on $ test than the hometown ped'ple did. 

In the adjustments\for subject abilities and tfest deficiencies, wher^ not only 
the hometown score is adjusted but also all the' other scores, there are two 
strategies which can be used Xo make the adjustment: proportional or constant. In 

.the proportional adjustment; the adjusted score is obtained by dividing the ^aw 
score by the hopetown score and multiplying by 100 to bring the results bacK to a 
percentage range. The - effect is { that , all scores* are raised by an amount 
proportional to the size of the raw score. In a constant adjustment, the adjusted 

. sogf e is obtained by adding to the raw* score the differdoce . between 100% and * the 
hometown score. . The effect here is that all scores are adjusted by adding a 
constant amount. As $ result, the constant adjustment §lways a yields a jfcbr^ greater 
than the proportional adjustment' for scores less than 1 00% • 

* ' There are thus five possible methods, for adjusting a raw intelligibility score: 




< 1 ) hometown, 

adjusted a J00J, - if rafw scory Is a hometown score; 
3 raw .spore u otherwise 

proportionai for subject, 
adjusted = (r^w / hometown score for subject) x 100 
»• 

(3) constant for subject, 

adjusted s raw'+ 100 - hometown for subject 

.r 

(4) propoYtional for test, 

adjusted a (raw / hometown score for test) x-100 
J \ . 

(5) constant for test, 

adjusted z raw + 100 - hometown, score for test 



* .Actually, there is no .reason to believe that for any given set <?f data only one- type 
of adjustment ifc needed. That is, it is probably closer to reality that the effect 
of learning curve, subject abilities, and test deficiencies could be simultante&usly 
affecting all the results. To find the combination of adjustments wtotfeh gives 
optimal results, however, would take the analysis beyond-* the techniques of 

v. 



•correlation and ■ regression and into the field of dynamio programming. Thus far this 
haa not been attempted; only the effects of one adjustment at a time have been 
•studied. 

Ho previouf investigators have come up with suggestions about which adjustments 
are. most appropriate for what- situations . Thus all fi«% adjustments* were made on 
•all oases -in the data sample in order ,to find the adjustment; which was most 
appropriate for eaoh aet of data. The rationale used"' for seleoting 0ne adjustment 
as the best is explained In the next paragraph . In the listing of the raw data 'In 
Appendix 1.2, the hometown score for £he subjeot-and the' hometown soore.for the teat 
ire liated for "eaoh data oaae . Theae valuea, along with -the raw irttelligibility 
ecore, plug into the above formulas to oompute the adjuated soores. The oomplete 
set of adjusted soores is not listed in the appendix. Only one adjusted score is 
listed for each oaae. Thia is the score* whioh was selected as most appropriates' for 
the given - Mt-.-of ' data. In' the description of the data aets in Appendix 1 H, the 
adjuatment used for eaoh aet ia liated.' * * _ 

The rationale for selecting -one method' of adjuatment as most appropriate for .a 
given » 3et ~ of data 13 based on two main assumptions. The .first ' is that there Is. a 
regular relationship between intelligibility and lexioal similarity. The second is 
.that the effects .of learning curve,,- sub'Jeot abilities, and test deficiencies 
introduce measurement errors whioh perturb, not enhanoe, the regularity of the 
relationship. From these assumptions lb follows that an adjustment whioh brings but 
a greater regularity is likely to be nearer* the aotual underlying relationship than 
one which^duoes the- regularity,. To evaluate the effeots ' of the dilfferenjt 
adjustment methods, eaoh of the five possible adjustments was performed on eaoh of 
the tea data % se<ts. For eaoh data set the methods were compared to find the one 
whioh brought .'out the most.' regularity frdm the 'raw data.. Three ,ori^eria were used' 
'to. Judge thfis:' maximizing the percentage of explained variation, minimizing the 
deviation frpm 100* of the predioted value of intelligibility for'lOOf lexical 
similarity, and jpinimizing the deviatioln from the mean of V 1 * value of lexical 
. similarity for- 0% intelligibility. The first has to do with regularity within the 
^particular set of data; the seoond two have "to do with regularity between sets of 
data and with a .theoretioal .norm. Never were the three orite'ria met in the same 
adjustment method. It was therefore neoessary to maice a, rather subjective Judgement 
as 4 to whioh adjustment gave the best- oombined effeot . The complete .aet of figures 
on whioh these Judgments were baslh and a fuller explanation of their, meaning are 
given in Appendix 1 .T so that - the interested reader oan better understand and 
evaluate the* selection process used. 

' * • ' , \ " , ■ ' " 

In Appendix , \ .5. new soattergram's for eaoh of the data sets are plotted. This 
time lexioal similarity Is plotted against^'adjusted Intelligibility scores. Again, 
the oaa*s . demons trajtihg an intelligibility fcooat from, honsymmetrio soolal faotors 
are plotted as "x^" and' the others are plotted as oircles. As before \n Appendix 
1.3*i the two regression 'lines a re.gr awn in "and the key statistics are listed b'ilow 
✓ the scattergram. , < ' . 

; In ..Figure. 5 the regression lines* for '''the ten sots of adjusted data with Vx 1 ' 
'points excluded a reaper imposed in ofle graph; t In 'comparing this graph with Figures 
5.land 5.3. twe tfiings-are to* be noted. First ,* there is a much sharper convergence' 
of the predicting lines toward the (100*. lOOf'Wpoint . .SeoOnd, the fanning out pf 
f the lines' at the' bottom of # jt he; graph has been narrowed. Tft\ result is that " t'he 
eight lines, whioh lie in felje middle .very War ly represent, the* same underlying 
relationship between" le'xical similarity and 'intelligibility . * 
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Figure 5.6 Plots for^adjusted intelligibility 
. wi th exclusions • 
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The details of the ten prediction lines are summarized in Figure 5.7 The 
format of this table is identical to that of Figures 5.2 and 5.4. 

The effeot of adjusting ^intelligibility soores oan be seen by comparing" Figure 
5.7 with Figure 5.4, the summary. for the previous stage in the analysis. The 
inorease in percentage of explained variation is only 3.5*. Changes In the 
correlation ooeffioient, slgnif ioanoe , and standard error of estimate are likewise 
minor. The significant ohanges are in the final two values, "Lex- 100" and "Int-0" 
The average predicted intelligibility for 100* lexical" similarity rlAes- from 68* to 
yy>; the standard deviation for this value .improves sharply from T1* to 3*. With 
adjusted intelligibility soores, th* predictions therefore give a natural result ' — 
that completely similar dialects, share oomplete Intelligibility. The variation 
between the prediction lines at the lower end ds also reduoed; the standard 
deviation for the point at which the lines oross the similarity axis is reduoed from 
40 to 25. If the two sets of data on the periphery (Biliau and Siouan) are not 
.considered, the degree of agreement between the other eight studies stands out. The 
standard deviation for the crossing point is only 8.8, with the mean at ^3.84 
lexical similarity. • , 

In comparing the effects of #ontrolling for nonsymmetrio sooial factors and 
controlling for intelligibility measurement error, the following oan be observed. 
The control for social -faotors improves the prediction acouracy within the various 
studies; the adjustment of raw intelligibility scores improves the agreement of 
predictions between studies. . .In other words, the one deoreases variation within 
stuQies while the other decreases variation between studies. ^ *• 

5.2.5 Conclusions 

The goal of this analysis has- been to see how well lexioal similarity predicts 
intelligibility. The purpose has been twofold: first, to determine the 
relations(hip between . Intelligibility and degree of. llnguistio similarity, and 
seoond, to determine how well lexical similarity oan function as an approximation to 
linguistic similarity. The main statistio whioh has been used to evaluate the 
results is the percentage of explained variation. At each s£ep in the analysis the 
goal has been to explain more variation in intelligibility than was explained in the 
previous -step by incorporating a new faotor into'the model to aooount' for some of 
the previously unexplained variation. The final step haa produced the following 
model to explain variation in iaiielligibility : 

total variation in intelligibility * 

variation explained by lexioal similarity * » 

variation explained by nOnsymfletrio sooial faotors + . * 
- * variation explained by intelligibility measurement error' ♦ 

unexplained variation' • 
-. ■ > - . 

where unexplained variation, includes variation duo, * to noniexioaf aspect* of 
linguistic similarity, symmetric .social .relations, intelligibilit/ measurement error 
not accounted for by hometown aoore adjustment (mainly sampling errors), ah<| lexical 
Similarity measurement error. -.„-_• <■ - j 

In Section- 5.2.2 we found that. on the average lexical ' similarity alone explains 
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Figure 5.7 Statistics fo-r adjusted intell ig ibi 1 i ty* 

with' exclusions 
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65* of the variation in raw intelligibility scopes. In'SeotioR 5.2.3 we found that 
by excluding oases in whioh it was suspected that nonsymtoetrio aooial factors 
boosted intelligibility, the percentage of explained variation was increased to &1$. 
We can therefore infer that the difference .between these two percentages, or t6$, is 
the amount of unexplained variation in the original formulation which was due to 
nonsymmetric social' factors . In Section 5.2.4 we found that if the cases which 
explained 8t$ of the variation in intelligibility were adjusted Vo control for some 
aspects of intelligibility measurement error, then the percentage of explained 
variation was raised to 84$. We c v an therefore infer that ^the amount of unexplained 
variation in the original formulation which was due to intelligibility measurement 
error w^i 3*. The decomposition of total variation is as follows: 

variation due to lexloal similarity 65* 

variation due to ' ' 

nonsymmetric social factors. 1 5% 
variation due u to* 

intelligibility measurement error 3% 

unqXPlalnea variation t J£l 

total variation in intelligibility 100% 

This method > of decomposing variation is called a hierarchical one t in that th$ 
components in the total variation are peeled of f layer by. layer. If the order in 
which the components are extracted is changed, the magnitude f of the percentage of 
explained variation for each component may change slightly. For example, when the 
effect of measurement error is cpjotrolled for first, and nonsymmetric social factors 
second, the decomposition is as follows: * 

variation due to lexical similarity 65$ 
variation due-to ' 

intelligibility measurement error . 5$ < 
variation due to ^ 

nonsymmetric social factors*. TU$ ' ■ 

unexplained variation ^Jil 

total variation in intelligibility ■ 100$ 

-For the sake of interpret ing. the 'results , this, latter ordering of the "decomposition" 
is perhaps more natural than the former. The former was followed ^in the analysis 
because the aocial 'factors explained a itfuch greater proportion of the variation than 
di.d the intelligibility measurement ejrror. By controlling for. .the social, factors 
first it was possible in the analysis to seleot the taethods/of intelligibility scbre 
adjustment so as to give the most refined analysis for the filial result. 

In this latter decomposition, 7Q$ of^ the total variation in intell JjgltUlity * is 
explained by *he first two factors r lexical similarity and intKlligibJLlity' 
measurement error- This explanation of 70$ of . the* variation in intelligibilityr has 
been made with recourse to only two variables, measured intelligibility and mea^frsed 
lexical similarity..- . The control for intelligibility measurement efror coipes only 
through a systematic transformation of the original* measurements based qY^ 
measurement of hometown scores. Thus no' additional Variables are meaeu^ea\ff 
included ih the mod/el. The fact that by -knowing only ode thing., about the 
relationship between speech Communities the, degree of lexical similarity between 
them, we- can explain the intelligibility relations between them With 70$ accuracy '1$ 
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a dramatic result. * 

Many investigators hav^e avoided the use of lexical comparison as a means of 
estimating intelligibility on the grounds that there are so many other factors 
involved: phonological similarity, grammatical similarity, semantic similarity, 
social relationships, political relationships, economic relationships, and 
geographic relationships. Nevertheless, for these ten u field studieif the single 
factor of lexical, similarity explains 70* of th* variation in intelligibility in the 
average case. The many other factors Serve only to account for the remaining 30J of 
unexplained variation. This does not necessarily mean that thesb other factors are 
irrelevant or of^only minor importance; rather, it probably indicates that- lexical 
similarity parallels other aspects of linguistic similarity and even sope aspects of 
contact such as measures o^social and geographical* proximity. 

The implication for field Research is clear: lexicostatistic comparisons are a 
valuable tool in sociolinguistic research on * communication between speech 
communities* This is not only because they are quick and easy, but also because 
they serve as reasonable estimaters of intelligibility. 

The fact that the regression lines in Figure 5.6 agree .to such a great extent 
also has important implications. Eight out of te\p of the field studies point to 
nearly the same underlying relationship between lexical similarity and 
intelligibility. This suggests that it is . not vain to search for a universal 
relationship between linguistic similarity and inherent intelligibility Cthat is, 
intelligibility based entirely on linguistic similarity and not at all on learning 
due to contact) . . f 

Of the two studies which do not fit the general pattern, one predicts t|igher 
intelligibility and the other predicts lower. In the Biliau study, the one which 
predicts higher intelligibility , the cause is definitely symmetric Social relations, 
' The two most divergent dialects in that study are orjly three hours 1 walking distance 
apart and there is a ldt of contact between them in both directions, %n the Siouan 
"study; the one which predicts lower intelligibility ; the available data do, not 
provide an answer. The cause may. lie in some asp^tts of linguistic similarity other 
than cognate percentages. If this yere ^o, then in only one out of ten field 
studies did lexical similarity fail to* parallel other aspects of linguistic 

similarity- „ ♦ ' * 

* ♦ 

Raymond Gordon (personal communication), one of the investigators in the Siouan- 
survey, suggests that the low intelligibility scores may reflect an unwillingness on 
the part of the subjects to give a response when they were at all uncertain. This 
is an interestiftg hypothesis which deserves further attention in future 
intelligibility surveys. J It suggests that this is one case where aocio-cultural 
factors in the test situation Would hardly affect the hometown "teat (since there 
would be little or no uncertainty) but #ould affect. the other tests. Therefore this 
kind of measurement error would go undetected by the raw score adjustment methods 
discussed in Section 5.2.4.' - / 

k final observation is that the results show a striking uniformity in spite of 
the fact that the ten studies were conducted by ten different investigators, all of 
whom'used different methods for measuring irttelligibility and different word lists 
and variations, in technique for scoring lexical' similarity. The implication here 
for intelligibility testing methods is that no one method, is inherently better than 
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1 '* 
anotheK- Some investigators used a translation approach, some used an open-ended 

question approach, and others used a multiple choice question approach; some used' an 

oral approach and others used a written approach; some used a vernacular approach 

and others used a common language approach. In spite of these differences, the 

results from study to study are surprisingly similar. This would suggest that the 

decision as to which kind of method to use is not based on the inherent merits of 

the method, but is based on the abilities qf the subjects, and' the goals of the 

investigator (Section 2.3. ) 

The implications of this uniformity in results for lexicostatistics is that its 
future in synchronic research on communication potential between dia^eots is 
promising. A number of authors (for instance, McElhanon 1971 :1M1, Hymes 1960:32) 
have expressed concern that lexicostatistics must undergo some precise development 
and standardization if results of the method are going to be valid and comparable: 
Their remarks are relevant mainly to the diachronip; or historical, application of 
lexicostatistics -to questions of linguistic history and taxonomy (Simons 
I977d : .1 U_ 1 7 ) . -Here, on the other* liand, we have seen that whether investigators use 
100-, 165-, or 2.00-word lists, and whether t*>ey elicit basic or cultural vocabulary, 
despite idiosyncratic differences in- eliciting and scoring'methods .the underlying 
results of all methods are strikingly similar. 

\ 

'■■ 

5.2.6 General models for predicting intel'ligibili^- from lexical similarity 

In Figure 5.6 (J jLt was shown that eight of the ten field studies very nearly 
suggest the same* underlying relationship between intelligibility arid lexical 
similarity. The data from these eight studies are now pooled tog-ether to form one 
large data set. The object of this section is- to investigate the possible universal 
relationship between intelligibility a,nd lexical similarity as evidenced by these 
combined data. First two linear models are given," v ;#ien seven different nonlinear 
models are explored. The nonlinear models offer slight improvements in prediction 
accuracy, but in no case i*s this improvement' statistically significant. The final 
conclusion is that the datapoints are too scattered to permit much discrimination 
between different models. •' 

The complete pooled data set is shown in the' scat tergram in Figure 5.8. It 
contains 175 cases. Adjusted, intelligibility scores are used and the points are 
excluded in. which an intelligibility boost from nonsymmetrio sodial factors is 
suspected.. The straight line which best describes the ■ relationship between the two 
variables is drawn into the graph. The equation for the'line. is- written below the 
scattergram. Note that the model explains nearly .85> of the variation, in 
intelligibility, and that the standard error of estimate for predictions based orf 
•the model is 13J. 

The slope constant fcjr the linear model is nearly 1.667, or five-thirds, and 
the intercept constant is nearly -66.67. In Figure .5.9 the linear model' is 
simplified by rounding the constants to these values which are easy to work with and 
easy to remember. If the, 1.667 is factored out of the -66.67, the resulting 
formulation makes the model f more transparent as to its meaning: • * 

Intelligibility s'5/3 (Similarity -MO) 
This 'model says that when the lexical similarity is below HO*, there will be no 
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Figure 5.8 Linear model 
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Figure 5.9 Simplified llnea'r model 



XrfTCLLXOt»ILXTv 



-I l I l 



"V 




LEXICAL SIhlLARITV 

Int - 1.667 Lex - 66.67, . 
or Int ■ 1.667 (Lex - 40)' ' 
\ , %EV - 84.2 V 

SEE*- 13.<8 r * 



,9 



90 



understanding (it actually predicts a negative value). When x ' lexical similarity 
exceeds 40%. th« percentage of intelligibility is five-thirds' times the amount qy ' 
which the similarity exceeds 40%. Stated in another way, for every percentage point 
which lexical similarity increases beyond 40%, the degree of intelligibility ' 
Increases by one and two-thirds per cent. Simplifying the -model in this way reduces 
the percentage of explained variation from that of the exact model given in Figure 
5. 8. by less than one per cent. 

There is so much' scatter in the scattergrams that it is^difficult to see what' 
kind of. trend the data actually suggest. One way to remove the scatter but still 
preserve the actual trend in the data is to plot the mean value ' of intelligibility 
-for specific ranges of lexical similarity. This is done in Figure 5. 10. The, 
similarity scale is divided into segments spanning five percentage points. ' For" 
instance, a^l the points with similarity greater than 95% and equal to or less than 
■ 100% are treated-.as one subset. .'The average lexical similarity and intelligibility 
for these points is computed and . the point where those two Values intersect is 
plotted. The game is done for the range of 90% to 95%, 85% to 90%, and so' on] until 
all the data points are accounted for. - The plotted points 'of mean similarity versus 
mean intelligibility are connected with solid lines in a' dot-to-dot. fashion to give 
a graph %^he trend inHhe data. Thid plot will appear in dashed lines in . all of 
the graph^Jgfbr nonlinear models which follow. 

Now seven different nonlinear models are explored. All offer slight 
improvements in the percentage of explained variation over the simplified linear 
model. However,-/ the greatest improvement is only four per cent and none of the 
mod«ls can be said to be significantly better than any of the others. , This " is 
bfcaule of the amount of scatter ±Jri the data. I suspect that as methods of 
measuring intelligibility , linguistic similarity, and Social Contact relationships ' 
are refined, the amo\^t of scatter in future plots of this kind will be reduced and 
the differences betweenNihe degree of fit of the different models will become 
significant. The following discussion and graphs of nonlinear models are included 
hot so much for what they reveal about the current data but as a guide for future 
research. The nonlinear functions were fitted to the data by least-squares 
techniques with a computer program. , % For input I specified the data and the form of 
the equation; the program computed the values of the constants in the equations. 

A nonlinear model which immediately comes to mind is one based directly on the 
trend line. For each five percentage point segment on the similarity scale we could 
predict that the degree of intelligibility w^ll be the mean intelligibility for that - 
segment. This* model is plotted in Fig'ure 5.11; it is called a step function. This 
model gives the highest percentage of explained yariation'of all the models we 
consider (88.6%) but that is little wonder since the predictions .are based directly 
on what intelligibility was observed to be rather than on„what we might expect it. to 
be on the basis of some general mathematical function.. N The model- is actually -quite 
v clumsy in that the mathematical formulation, of it is' so lengthy and it does not seem 
yery natural. We would expeat that intelligibility. wouKl, consistently increase, as 
lexical similarity increases, not fluctuate up and down as the modelMn Figure 5.11 
does at the lower end of the graph. 

One way -to approximate a ^nonlinear function is to use different linear 
functions to describe different .portions of the curve. In Figure 5.12 the 
intelligibility curve is approximated by two. straight lines.' » Inspection of the K 
trend line shows that above 60% similarity, intelligibility steadily rises. ^Below 

* «•. ■ •• \ 
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kigura 5.11 Step function ' 
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Figure 5.12 Polygonal- model 
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60* the Intelligibility fluctuates and shows no steady treod. ' A linear ' regression 
analysis was performed on these two halves of the data to find the lines whioh best 
tit each* The resulting ^f unction is called a polygonal function and the value of it 
Is th« maximum of the values predicted from the two linear functions, Note that the 
polygonal mcylel never predicts the absence of \intelligibility (although it^oould be 
constrained to do so without any significant Toss in accuracy) ; even when there is 
no similarity, this models predicts 6* understanding. The • simplified linear model 
in Rigurfc 5.8 ( is also polygonal , when we interpret it as predicating 0* 
intelligibility when similarity Is 40tf or less. 

• * 

In Jigure 5.13, a parabolic model is fitted to the data. Note that ib too 
neveV predicts the absence of intelligibility. To correct this flaw in the. model, 
the- parabola can be constrained to reach- the zero leVel of intelligibility. This (is 
done in Figure 5 . 1 U by constraining* the formula for the parabola such that the 
intelligibility coordinate of tiA» minimum point of the curve is. zero. 

In Figures 5.15 and 5.16 two exponential functions are plotted. The first is 
the basic expbnential function. Note that it never ^reaches zero. Figure 5.16 plots 
'a modified 'exponential function which incorporates an additive constant to bring the 
curve below the zero intelligibility axis. 

The final model is called a logisitio, or S-curve, model. It is plotted in 
Figure 5.17." The logistic model is unique 'among all of the models considered here 
„in that it places an upper limit on the value of intelligibility and predicts ' that 
as the upper limit is neared, increases in similarity have less and less e\f feet on 
intelligibility. Whereas in all the other models, intelligibility increase^ at a 
constant or a growing rate as similarity increases, in the logistic model the rate 
of change slows down and levels off as intelligibility reaches its limit of 1 00% . ' 
This is in line with a theoretical expectation, namely the role of redundancy in 
dialect intelligibility. Because of the redundancy, in language, ..listeners are able 
to fill in some of the items they hear that are not familiar to them. I would 
expect that^in the. rarige of 70$ to 90% similarity, the redundancy strategy is 'used 
with the greatest benefit. In thj^s range, an increase in similarity could be 
expected to give a substantial increase in intelligibility, not fcnly \ because that 
much more is similar but because that much more can be used as a base from whlclf 'to 
fill in that whioh is not familiar. Above 9051 similarity, most everything would be 
understood so that increasing the similarity would only slightly increase the 
intelligibility. 

Of aH- the models explored, the logistic seems the most theoretically 
satisfying. Howeyer, the current data ' do not give strong evidence that the"* 
relationship between intelligibility and lexical similarity is a. logistic one. One' 
problem, is the degree of scatter in the data which- has already been mentioned .\_ 
Another factor is that the formula used represents a symmetric curve although the 
relationship may not actually be. The curve is symmetric around the fl-exion point 
(7-2*). This explains why 1 00% similarity does not predict 100% intelligibility in 
Figure j5 . 17. The many 4 data pointsTaround 50% similarity pull up the curve at the 
low end, which has the effect of pulling it down at the high end. This shortcoming 
might be overcome by proposing a model which was not symmetric. This could be 
theoretically justified by demonstrating that different understanding processes are 
•at work at different ends of the intelligibility scale. 



95 



Figure 5.13 Parabolic model 
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Figure 5.14 Constrained parabolic model 
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Figure 5.15 Exponential model 
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Figure 5.16 Modified exponential model 
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Figure 5.17 Logistic model * 



INTCLLIOI1XLITV 




LEXICAL SIMILARITY 



Iht 



100 



1 + e 



.0844CLOX-72.07) 



%EV « 84.9 
S-EE - 13.0 



CHAPTER 6 r 
v EXPLAINING COMMUNICATION: SOCIAL FACTORS 



In Section 5,2.5 we saw that lexioal similarity alone explains 70* of - the 
variation in adjusted intelligibility. When presumed nonsymme\rio 3o6ia4 ^acitor? 
were added to the model,. 84? of the variationals explained. That is," nonaymtofttrio- 
sooial factors explain nearly half of the previously unexplained variation: . If 
sooial relations had aotqally been measured, then symmeferio s.ooial relations' * -Would 
have been included in the" model, and even more of the variation in intelligibility" 
might -have; been explained. In this chapter, this is done. The results of \ art 
empirioal. study of communication' on- Santa Cruz Island show that social factors 
account for most of the unexplained variation- which remains ,after lexioal similarity ' 
is controlled for. ^ ^ . 

■In Seotion'6.1 social relations are defined and enumerated, especially with 
reference to their function" within dialect 'systems. In' Section $.2 a mathematical 
model for explaining communication is developed; it includes both linguistio 
similarity and social relations. In Section 6.3. the general model is tested 
empirically with data from Santa Cruz Island, Solomon Islands. The results show 
that predictions of intelligibility made by the final model are oorrect 95* of the 
time, * * 



6.1' The characteristics of a dialect system 

t 

A key to understanding communication patterns £s-to view the dialects involved 
as comprising a system, more specifically, a iiaJLasi jyjieja". A-'dialefct system has 
four defining* characteristics: (1) it is a jat of dialeots, (2) th^ . dialeots are 
1-inked by relations of InfcgcaflUfln, (3) the relations between pairs of dialeots are 
defined by the IntertiePflndfinfie, of all dialectg, and C4) common dependence on a 
center acoounts for general patterns of' oommunioation. Eaoh of these -four 
characteristics is now discussed in turn. 

, i . - . 

6.1.1 A set of dialects 

First of all, adialeot system consists of a set of dialects. When considering' 
linguistic relations we concentrated on dialeots as varieties 'of speeoh. ,Now in 
considering social relations our perspective turns to' dialeots as the groups of 
people who share those speeoh varieties. 

Miohael Hall'iday, in defining grammatical systems, is more precise about of 
Items whioh, oomprise a system. He states that it must contain a finite number of 
members and that each member is exolusive of (that is, different from) all others 
(Halliday 196.1:247). Such is true of dialect systems also where the dialect groups 
are the minimal members of the system. 

Tlrfe system must also be closed. That is, we must assume that there are no 
speech communities outside- the system wh±«tr affect patterns, of oommunioation within' 



- the /system*/ In the^ealt.worldV th^a! i$ . stflvlQiif*' °true "in a itrict ?ense, except: 
^perhaps in , x the\c£tfe of ar diaisci syst<yn joonf ined% to an isolated island. However, 
? one cJ&n generally safely a«surae^th»t ..the efjfeota of qutside speech " cqmmunities* are 
t negligible,, w^qn corp^are^' to 4he' affects of inside groups. 9 When a moclel is 
empirically* basted; inflilpnoe frojn outside the , dialect system will.^show up as 
unexplained* variatiorft If t he', ampurit of unexplained variation is negligible, the 
' assumption pf cAo3ure*is justified; ,if it is not, the assumption may have to be 
reexamined - .. ' V / / . v 

6. 1.2" Ljinked by irrteractipp 

• ^ A second characteristic ofi dialtfct system is th&t the dialects arfc'linked by 
relations of interaction- 'These links are social, economic, .geographic, political",' 

-and. ideological in nature. However, tftey are ultimately ^realised as communication 1 
between "individuals in speech communities. A1J. thess different types of 'blinks are 

.wha£ I have thus far lumped togethfer as ••social factor^/ In this section "different 
facets; of ' interaction* are explored und^r -three main headings: channels of 
interaction/ pat^terrfs of interaction, and measuring' interaction and contact. 

** • i r ' - » "-' * * ■ 1 1 r " , * ■ 

6.1,2.1 .Channels or interaction 

* " * 

■/ . " * ' * . ■ ' 

By qharin-els^Qf int^r^etlpn i refer to^the channels through \ which ^interaction 
o'cdurjs . \\ I . concentrator/ -on. thecausfejS of interaction and NjiWaify interaction as 
raotiyated by-, geography / demography, botamuhity facilities, or associations. 



" ^G^ographj^ is. a -^Channel 'of interaction , primarily because it governs , the ease of 
travei between ^n|eech vQCrtnmunitles One aspeot of geography is proximity. The 
nearer two commu^fi^es.are , v the more likely* they are to interact., This includes the 
likelihood of bo£h, planned interaction, and -chance interaction. Planned interaction 
occurs when a journey is made'with the expressed intent of interacting with .members 
of another speech community/ Chance interaction takes place when a meeting .is 
unplanned but occurs because members of at least one Qf the communities are 
traveling. ^ \ 

/ Other aspects of the -geogfaphic factor are tpryy-tin and routes of travelx. 
Mountain ranges, Vapid rivers, -and swamps may be barriers ; to interaction.. 
Conversely roads,; navigable rivers, or a coast line may boost travel and 
interaction. * 1 

Demography, particularly* the density and distribution of .population, also 

contributes to interaction. The higher the population of a dialect, the gceater the 
likelihood of either planned or chance interaction involving it* Not only the 
population, but also the density of population in the .surrounding region can 
encourage interaction. That is, if a small speech community had a large neighbor, 
it would be more likely to ^attract interaction from more distant speech communities 
than if it had no neighbor at all. v . 

Community facilities, which include for instance stores and churches, are focal 
points of activity where interaction takes place. These facilities attract people 
from other communities who come to partake of the goods apd services which tiro 
facility offers. The result is interact iqn between members of the host community 
"and thS" visitbrs, as well as between visitors wh9 might come from different 
dialects, These s community facilities are generally quite visible; they are usually 
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located. in a- building or $ooe other man made ' -Structure, Beoause they are - so 
.visible, they, give easy but good olues to patternavDf communication for the field 
investigator. 

•j > . >■ 

v, . / 

In most developing countries of the world it is possible' to distinguish two 

levels of culture (Combs .1977 ).> One la the traditional culture as praotiped by the 

indigenous Inhabitants of the. land. The other is a dominant national culture which 

is often ' colonial and European* JLn its origin. Some of -the .oommunity facilities 

might be part of the traditional culture, for instance, a religious cult - house, a 

traditional marketplace, or the residence of a politioal leader. However, in 

^oday's world, most community facilities seem to, be part of ifchj national culture. 

(The institutions'of the traditional- oulture appear to be more oommonly realized in 

the networks of associations considered next, than '/in speoific focal points of 
activity.) — 

\ ' . • 

The^community facilities of the* dominant national oulture may affeot any aspect 
of life. In the case of a store or an 'industry , /the focus of interaction is 
economic. In the case of a church, it is religious and social. In' the case of an 
administrative headquarters or police station, it has to do with politics, 
government, or law and order. In the case of schools, the foEus is on education and 
socialization. In the case of 'a hospital or clinic, it is sickness and health. In 
the case of a road, an airstrip, or a harbor, transportation is the focus. Al'l of 
these facilities can be the site for significant contact between peoples of 
different dialects, and thus the location of each and the dialfeots served by each 
are important for explaining interaction. ^ . ' 

Associations between dialects oan be as important a channel of interaction as 
community facilities,, though they- are generally less visible and thus more difficult 
for the outside ^investigator -to observe'. Some Cultural institutions realize 
themselves in focal locations where goods and servioes 'are obtained; 6lhfirs are' 
realized in networks of associations or . alliances which link dialect groups 
together. On the social side, marriage is -one such source or interaction. When 
marriages o«cur between speakers of d if ferent^ dialects there are- -at least two 
-relevant effects: (1) the children from that marriage usually grow up in contact 
with bQth dialects, and (2) the marriage may bind not only the two individuals, but 
ajLso their whole families or lineages, The result U a channel of interaction 
between the groups as visits between villages are made. ^Adoption alliances can have 
similar effects in some societies-. 

On the economic side, traditional trading alliances oan be a s«uroe of 
.interaction. .Even if the trading occur* infrequently, it c*rt be important because 
it is a source of regular interaction. Perhaps the best documented example of 
trading allianoes is- the vaat Kula ring off the eastern tip of Papua \ev Guinea 
(Malinowski 1922). This trading ring connected many distant islands as well as a 
few spots on the mainland. Although the trading ocourred only onoe yearly, it had a 
profound effect on those involved and resulted^ in life-long partnerships between men 
of different is;ands^ahd different languages.} 

Kenneth HoE^anon ( 1970) has disoussed the relation of trade routes and 
linguistio- interaction in the Huon Peninsula of Papua New Guinea. That whole area 
is characterized by. extremely rugged terrain." As aVesult, trade routes are well 
defined and confined to oertain mountain^ parses . * In explaining the" occurrence of 
borrowings .in lexioal cognate percentages, he suggests • that the -borrowing ocours 
along the trade' routes (1970:216). This is evidence for linguistic interaction 
along the lines of trading alliances. 
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Two w other aspects of cultural interaction are religious $nd political. On the 
religious' aide, different dieU ect> 'groups may .interact in Religious ceremonies an£ 
ritualrfT' In Melanesia , Jail' night dances are common ( aingain g in New Guinea Pidgin, 
dap? in Solomon Islands Pidgin), These datices have thei^roots in the belief and 
ritual systems of the traditional- culture; however 1 , as islanders adopt Christianity 
these gatherings are beginning to take on a more pifrely ^sopial function. The dances 
ape occasions for interaction between speech communities. Typically the host 
village invites numbers of surrounding villages (often frbit different dialects) to 
Join ift the danci, as well as in the giving or exchanging c|f vast quantities of food 
or valuables which may occur at the aaqe time. On the political side, village 
defense alliances may span different dialects. Prominent leaders may . have 
Jurisdiction over more than their own speech community. 

; 

All.*, of the above factors, geography, demography, Community facilities, and 
•ssoc iations, cause interaction between speech groups. In Section 6.1.2.3 some 
methods by which these can be measured ape briefly discussecL 

\ 

6.1.2.2 Patterns of interaction * \ 

\ 

In order. tQ .explain dialect intelligibility, it is necessary to distinguish 
betweefl interaction and contact: To say that speakers of dialect A have frequent 
interaction with speakers of dialect B, suggests nothing * of how well A might 
understand B f s. dialect. When they interact, they might ude only A!s dialect or they 
might use only a, third language, so that A never hears B's dialect. This is where 
.contact comes into the picture.- In this hypothetical case, we would say that A has 
frequent interaction with Br but has no contact with B's dialect. 

y^efine interaction to be a reciprocal, two-way phenomenon. That is, it .takes 
p^laoe in two directions at once and in both directions it has the" same intensity; A 
has as much interaction with B as B has with A'. It makes no reference to who does 
the talking and it makes no reference to what varieties of speech are used. On the 
other hand, 1 restrict the meaning of pontact Xo refer to a nQnreciprocal, one-way' 
phenomenon defined specifically in -terms of the variety of speech used. By ?aying 
that A has cohtact with B, I mean specifically that A has contact *fith B's variety 
of speech,* or dialect. The relationship is. nonrecipro.cal and one-way because 
knowing how much contact A has with B's dialect tells us nothing of how much contact 
B has with A's dialect. It is therefore A' s contact with B's dialect, not the 
interaction between them, which explains A f s' intelligibility of B. - 

* ■ i - . 

On the basis of the contact relations involved in interaction, I propose a 
classification of patterns* of interaction into four types: (1) balanced, 

r (2) imb^lanced, (3) rival, or (4) distant. A balanced interaction is defined as one 
in which the speech Varieties of bgth participants are used to an equal extent. 
That is, when person A speaks, he uses his own dialect; when speaks, he uses his^ 
own dialect; er thej^both c<puid swap off using each other's dialect. For ; predicting ' 
communication between dialects, it is more useful to classify patterns of\ 
interaction with, regard* to the whole pattern" of interaction between two dialects 
rather than. in isolated conversations.* Defined over a pair -of dialects, balanced 
interaction would mean that on average both speech varieties are used to an equal 
extent. It could be that when speakers of A and B meet in the village of A, both 
speaker and hearer use the dialect of A, But a v balanced interaction would also, 
imply that if these-same speakers ware in village B,* both speaker and hearer wc*|ld 
use B's dialect. • 
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Skipping- "to the fourth type of interaction, distant interaction , it is also 
straightforward and needs little comment . In such a type of interaction, the' 
participants have had ,so little interaction (probably due to geographic and 
linguistic separation) that they are not able to use either of their own dialects. 
Instead, they must use a common language suoh as a trade language or the national 
language. . 

s \ 

The second type of interaction, imbalance interaction , is one in which contact 
between the two dialects is greater in one direction than in the other. This 
■pattern of interaction is especially important to the language planner because it 
commonly results in nonreoiprooal intelligibility and points to centers in patterns 
of communication. Using the example of speakers from dialects A -and B, interaction 
would be imbalanced if both participants generally spoke A's dialect when conversing 
with one another, or if both generally spoke B's dialect. 

The explanation of imbalanced interaction can be found in the causes of 
interaction discussed in the previous section. JThe imbalance could be due to the 
central geographic 'location of one dialect as opposed to the remote location of the 
other. It could be due to the large population of one dialect" as opposed to tffe 
smail population of the other. It could be due to the - availability of goods ^n.d 
services at the community facilities located in one'dialect and their absence in' the 
other. It could be due to the widespread marriage, trading, or defense alliances of 
one dialect and the limited alliances of the other. • A10. of these relations suggest 
an imbalance, with the result that in each case, the movement ,of people will be 
- greater in the direction of the first dialect than in that of the second. When the 
movement of people is imbalanced, then we can also expect that the ^effects .of 
dialect contact and learning will be imbalanced. The group which puts more effort 
into mob/ility is likely to put more effort into dialect learning. The groUp which 
is more static, is more likely'to be less accomodating linguistically. 4 

The term "prestige" has. been used by other investigators to label imbalanoa'ed 
relationships. However, I feel that the term is not adequate* beoajtee it is not 
general .enough: a prestige relation is only a special/ case of an imbalanced 
relation. The use of the term "prestige" dates back at least to Leonard Bloomfleld'. 
In discussing the social conditions which foster language borrowing, hd» ^suggested 
that there, are » two main factors, "the .density of 'communioati'on and the .relative 
prestige of different*, social groups" 0933:345^.,. Charles Hocket.t, in his. textbook 
on general linguistics, devotes a section • of the chapter on the conditions for 
borrowing- to the idea of prestige. He says that the speaker must have slime .motive 
for borrowing and that two motives stand out as the most important, the prestige 
motive and the need filling motive (1958:104). Although these authors were speaking 
of prestige as a motive for language borrowing, the term has became widespread and 
found its way Into, the literature- on dialect intelligibility. For instance, 
Ladefoged, Glick, and Criper (1972:77) state that "the percentage of words in common 
allows us to predict the degree\of o^rtfprehepsion, except when questions of prestige 
are involyed." However, the w$r<J h prestige" carries with it connotations of 
.'•esteem" and "admiration". For this reason, the term is not really appropriate in 
the general usef it has received. 




The anthropologist S. F. Nadel, in his book The Theory of SpoiaJ. Stri 

offers a general framework in which to consider imbalanced social relation's. He 
; suggests that one of the factors which . explains differential status in social 
systems is the relative "command over -.seo^ ices and benefits"' C1957 : 1 17) . He lists 
some of the services and benefits an\ individual or group might command as: 
"(1 ) jmaterial resources and benefits; (2) Social dignity (prestige, esteem, status 



irw a hierarchical , sense) ; (M) emotional, senaual, and aesthetic gratificat 
(5) moral value* (the 'fulfilment of duties and 'missions'); ^and (6) transcendental 
values" (the 'spiritual' benefits of religion)" (1957:118)- Prestige figures into 
this list as only one of many possible motivations for imbalances! contact. The 
general motivation underlying all of the above is probably need or expediency or 
lack of alternatives, The one group has command over something (be it material 
resources, prestige, ypecial learning or skills, religious knowledge) that the other 
group feels I need Tor. When the second group goes to the first group to fill that 

;ieed, i^baj^gpzed Interaction is likely to ooour. % 

T/ i 
* f * [ 

The regaining pattern of interaction is one of rivalry. In this relationship, 
the two dl&leots ar.e similar and both participants could understand and perhaps even 
use . the /speech of the other. However, because of rivalry between the two groups 
they avoYd tJhe use of the local dialects when interacting wittf\one another. 
Instead/ they prefer to use a national language r or trade language jihich serves ta 
deny arw linguistic unity between the groups. Sometimes the rivalry emotion is .one 
sided; /one group strives "for disassociation, while the other group does not; In the 
case /of the distant pattern' of interaction, the distance separating the groups is 
such ythat the participants could not use the local dialects even if they^wanted to; 
in the case of the rivalry pattern, both participants could" use the local dialect, 
but/at least one does not want to. * ; 

Hans Wolff's paper & "Intelligibility and Inter-ethnic Attitudes" (1959) gives 
Samples of . the rivalry pattern of interaction. For instance, until recently it was 
/enerally agreed by speakers of Urhobo and Isoko dialects of southwestern Nigeria 
(hat the two dialect^ were mutually intelligible. However, Wolff reports that 
^lately the Isoko speakers are claiming otherwise. He states that "this claim has 
coincided w^th Isoko demands for greater self-sufficiency" (1959:37). , 

/ Attitude is, a t,erm which often enters into discussions of ■ explaining 
communication (Wolff 1959, Casad 1974:185-188, Calliste* 1977, Collier 1977).- 
Therefore it would good to clarify the position of attitudes in explaining 

communication. B$r attitudes I am referring* to feelings one group might have toward 
another, feelings such as friendliness r < or animosity, esteemer scorn, trust or 
suspicion. I feel that attitudes are not so much a .direct factor in explaining 
degree of intelligibility as' they are in explaining patterns l of interaction. That 
is, attitudes affect patterns of interaction, which' in turn affect intelligibility. 
Casad's model^ also reflects this vi^w (1974:184-186). Thus, if contact, which 
factors out the components of interactions, is«measur<rd and plugged into a' model to 
explain communication, instead of using rdfciprocal interaction in the model, then 
attitudes hav$ already entered into the contact factor and do not play a separate 
role in the model. However, when contaot is a v lso predicted, (see Section 6 .2*2) , 
then attitude becomes more important. Ultimately, attitude probably has a bigger 
role to. play in determining the acceptability of materials written in one dialeot to 
speakers of another dialect than it doeskin explaining one group's comprehension of 
another. * * 

6.1.2.3' Measuring interaction .-and contaot 

* 

One way to gather information about patterns of interaction is -to ask about 
them directly. For instance, go into a speech oommunity and ask, "When you meet 
someone from that other community, dp you* each speak your own dialects, do you ppeak 
hist dialect, does -he speak your^ior do you both use the trade language?" If both 
use their own dialects, a fctflanoed relationship is implied. If they usejl one 
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diafeot exclusively, the relationship is irabalanced- in the* direotion of that 
- dialect. If they -use a third lavage, distanoe or rivalry is indicated. 

lM< f/ l ?*^, ra !l'l! 0dS l -; C,e3Cribed in the rest of thla section refine and validate these 
initial findings by uncovering the channels of interaction and « by estimating the 
degree or contact in each direotion of the'interaction . Many -lines of investigation- 
are suggested. Not all will be appropriate for every situation, and there will be 
exceptions to every trend I suggest. The approach I have used in the field is to 
follow up lines of investigation whioh seem fruitful for" the given situation, and .' n 

a^oF^Tn* 8ta " 8ti0a i analysis on the findings (Section 6'.3> to discover . what * ■ 
aapects of interaction and contact best explain oommunioation . 

As suggested in -the preoeding section, the one-way oontaot which ^ dialeot has 
with another is more important in . explaining communication than the two-way . 
reoiprooal interaction between the two dialects. Thus, the methods described- here 
ooncentrate on the observation and measurement of one-way oontaot rather than 
two-way interactidn. The presentation here is brief; the paper by Sandra Callister 
. (1977) on sociolinguistio approaches to dialebt surveying in Papua New Quinea is 
probably. the best guide presently available for formulating .questions and presenting 
the. results of an investigation of oontaot relations. She lists many more possible 
questions than I do here, discuses the different ways a question can be phrased, 
and illustrates the method .jkjf presenting results in a'matrix. Delbert Miller's 
HandbQQk .fll Research H&alm Social Measurement, ( 1Q77^ may also be helpful. It 
is an exhaustive guide to soolometrio methods in general. 

The measurement of geographic factors is fairly straightforward. /Distances oan 
oe measured from- a map in miles. or kilometers . A more meaningful- measure is perhaps 
the traveling >time between speech communities expressed in hours or minutes or- 
perhaps even days. Traveling time takes into acoount some of the geographio 
barriers to interaction suoh as mountain ranges, as well as some of the boosters of 
interaction, such as roads or navigable rivers. The raw distance measurement is a 
.two-way, reciprocal predictor of interaction. In Section 6.1.3 the concept of 
measuring distance relatively with respect to the dialect^ystem rather than against 
an absolute, .measuring 3Ca le is presented. This has the result of giving a. 
geographic estimate for one-way contaot. This teohnique is illustrated with the 
data in- Appendix 2.1.3 and Sectipn 6. 3. "4. 

Population can be measured or e-stimated by means of oejnsus teohniquesft The 
population of one group relative. to a second gives a one-way estimate of oontapt. * 
A measure of relative population is computed by dividing the ^population of the 
second group by that of the first . . A score greater than o& indicates that the 
second group is larger and that contact is likely to be imbalanfed in the direotion 
of the second dialect. A score less than one suggests an imbajknce toward the first 
dialect. Note that the population of the first relative to t*e second is different 
-than the second relative to the first, thus relative populab/on -estimates 'one-way 
contact. • * u 

Perhaps 'a more meaningful measure is relative density of population. Suoh a 
measure takes into account the location of neighboring villages and ' the possible 
effect they have* on attracting interaction from other grduA. To measure population 
' in terms of density, rather than aotual' number of inhabitants, has the effect of, 
-measuring population with, respect to the system rathertfthan in isolation (see' 
Seotion 6.1. 3). This teohnique, . is illustrated with Jhe data in Appendix 2 .1 .4 # 
The results for the Santa Cruz island data show that deAity- of population is a 
slightly better prediotor of intelligibility than population (Figure. 6. 13) , but not 

» 
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significantly better. 

^ In measuring the contribution of community facilities to contact, .the firiat 
step is t>o plot the locat^n of the facilities within the area of the dialect system 
-on a map. Much of this information may already be on exisjting maps. Much of It can 
M W tiered from the simple observations of the investigator. f To ensure $ thorough 
Job, ftowever, it is generally necessary to ask questions 1^ the villages to 
determine where each of these- facilities is located. Some of thk facilities to take 
note of are: churches, schools, stores, markets, clinips, hospitals, government 
offices, policy stations, . plantations, factories* truck or'* bus depots, airstrips!, 
and harbors. vAftqr -these data are collected a simple measure of the relative 
command over services of dialects in the system can f be obtained by totaling up the 
number of facilities in each village or dialect area. One oould then hypothesize 
that the direction of contact will bp from the dialects with fewer facilities to the 
ones having more. , t ^ 

A more refined measure can be obtainedv^^determining the domain', or 'area of 
influence, of each community facility, "this ^s^done by asking at each village where 
they go to obtain the goods &nd services they "require. That is, at a particular 
village ask, "Where do you attend church?", "Where do you g<\ to market?" , "Where do 
you go *to buy store goods?", "Where do your children go to school? 11 , and so on, A 
method of tabulating th-e data is presented at the end of this section. • 

Gathering data on cultural associations is more cliff icult because it requires 
>orae investigation of the cultural traditions in the area. Marriage is probably the 
easiest kind of association to study. Ope .approach to studying marriage ties is to 
treaty them as an indicator of two-way re.ciprocal interaction. In this approach one 
notes the presence of ties between a pair of speech communities without regard to 
the direction the tie might take in terms of residence of the married 'couple. ^ The 
ipost simple way to question and record responses is to use yes-no questions," For 
instance, "Is there anyone from this community who is married to sorapone from that 
community?" The answers wilj. be yes or no and r these' can be recorded^in a table as ■ 
ones or zeros. For the opposite extreme of complexity, 6ne could take a census 
approach and qount or estimate the actual number of marriages that link each pair of 
village^. This would 'involve talking to every couple in^ a village or to a 
representative sample and find out where tjie husband and wife are from. \ k level of 
complexity which is midway between and which is .probably the best foV these purposes 
is' to "record responses^ in sSme scale of degrees. In such an investigation the 
question a^ke^ would be, "How many marriages are there between a person from this 
community and\a person from that community?" Tl\e investigator could Judge the 
response and score it on a threfe level scale . such as "no marriage ties", "some 
ties", or^ "many ties 1 ?. Or a* scale which approximates the number of marntagefc could 
be used, suoh as the following four level scale: "zero", "one or two", "three td^ 
five", or "six or more". Such scaling approaches are used when 0 the data are not so 
reliable and exhaustive that the investigator can be sure of the complete accuracy 
of the subjects 1 responses. v , 

- By noti-ng the plaoe of residence of the married couples, the result Is a more 
refined method-of measuring marriage ties as one-way indicators of contaot. Such ah 
approach shoiild'be preceded by some investigation into the marriage customs of the 
peopLe to find out if it is oustomary for the couple to live in the community of the 
wife, of the husband, or^of their -own choosing. An understanding of the land and 
inheritance rights people retai^i if they move away and how they keep claim to them 
active might also be relevant. Investigat ipn t>f the kinds of contact which result 
froi? marriage ties would also be helpful, such as patterns*" of visiting between 
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families or the. relationship of marriage £ie 9 to trading alliances and participation 
in ceremonies. In treating marriage iVi a one-way fashion, a distinction must be 
made between the village of residence and the village from-which the spouse, comes. 
The kinds of- questions used in investigating two-way marriage ties are refined by 
^incorporating place of residence. If plaee of residence is strictly ^prescribed as 
being the original community of the husband or of the wt*e, then the questions can 
be phrased in terms of $ex rather than place .of residence. For instance, "How many 
ywomen from here have married a man from that community?" Methods of tabulating 
results are discussed at the end of this section. 

Measuring other kinds *of associations, 3 uch Nas tradings ceremonial, or 
political alliances, requires preliminary investigation to discover the nature of 

ii?„i?J*I! a S ti0 ?: ? nC * the P re3en0 « of a certain kind of interaction has been. 
estaDiished, it is possible to formulate questions, that could be asked in the 
dialect survey. "Who?" questions can be used to establish the fact of interaction. 
"Where?" questions can be used to determine the direction of oontaot. "How many?" 
and "How often?" questions can be used to estimate the degree of contact in each 
direction. :% . 



The above methods are designed to gather data on interaction and contact. Data 
on attitudes can be gathered by changing the perspective of the questions. Rather 
than asking about the facts of past contact, ask about preference for oontact with 
one group as against another or ask for a value Judgement concerning another 
village. Callister (1977:201-2) and Collier (1977:260) both give lists of possible 
questions to use. Miller (1977) describes methods of developing sociometric scales 
and indexes which could be used to develop- schemes for. assessing interdialect 
attitudes. 

For all of the methods in which contact Relations between speeoh communities 
are investigated, the best way to "organize and tabulate the data , is to put them in a 
two-dimensional matrix. To simplify the comparison of results from different ' lines 
of investigation, all such matrices should be consistently labeled with reapeot to 
the ordering of dialects and the orientation of the two dimensions. The data 
matrices in Appendix , 2.1 exemplify this kind of consistent labeling and should be 
referred to for examples as these principles of labeling are disoussed in the next 
two paragraphs. t 

The dialects should be listed in an order whioh causes the values for highest* 
co.ntaot to olus.ter along the diagonal of ,the matrix and the values'" for lqwest 
contact to occur on the edges. Asbher and Asoher (1963) describes 1 an algorithm 
which orders matrices in this way. With this arrangement dialects whioh are 
adjacent' in the ordering have a high degree of oontaot and those whioh are separated 
have lower degrees. In general the optimal ordering will be, olose to a geographio 
one. Alphabetical orderings are to be avoided, because they fail to bring out the 
natural ordering relationships which the data values themselves imply. 

All matrices should be labeled with the same orientation of the two dimensions. 
I have adopted the convention of labeling the dialects along the left hand side 
(that is, the rows) as the origin and the dialects along the top (that 4.3, the 
columns) as - * the destination. In this way the movement implied in. the oontaot 
relations is read from left to right in the matrices. The* following* descriptive 
labels for tne dimensions oan be used. For relative distance, ' the labels can be 
"from" and "to". For oommunity facilities they can be "domain" (or "users") and* 
"loaation". For marriages they^can be "place of origin" and "place of Residence" , 
or simply "women" and \"men" if the prescribed pattern is one of residenoe in- the 
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husband' 3 . community. For ceremonials they can be !' visitors" and "host"? "Origin" 
and "destination" are another pair of labels that can be used to describe, movement 
o.f people. ^ Since fche contact of tfce origin group with the destinatidn group is what 
predicts the origin group's intelligibility of the destination group's dialect, in 
an intelligibility matrix "hearer" corresponds to origin and "speaker" corresponds 
to destination. ■ AfteV labelsing", each. of the contact measurements is written into 
the matrix cell where the f*ow fo^ the origin group intersects the column for the 
destination group. 

These methods ^_n ^which a two-dimensional matrix* is filled in yield a large 
number of data points. If there are XL dialects, then there will be ji-squared 
possible measurements of contact between them. However, the relationships within 
the whole dialect system can be summarized in terms of jx rp«asurements of the 
relative attraction and motivation of ^dialects by summing thp rows and ^columns of 
the matrix. When the rows represent origins and the oolumns represent destinations, 
then the sum of a row gives a measure of the,"; overall motivation of that dialect 
group to travel and make contact; the sum of a. column gives a measure of the overall 
attraction of that dialect. Comparing the sums of the rows gives an indication of 
the relative motivation of the dialects within the system to make contacts; some 
will show an Outgoing nature while others will show a stay-at-home nature. 
Comparing the >ums of the columns indicates relative attraction; it will become 
clear which dialects attract a Lot of contact and which do not. Tfris technique , is 
used* with the Santa Cruz data in Appendices 2.1.8, 2.1 . 9 , and 2.1.10. 



6.1.3 Relations defined by interdependence 

A third characteristic of dialect systems is that the relations between pairs 
of dialects are* defined by the interdependence of all dialects. This 
interdependence characteristic of systems is one "which has be$n recognized and used 
by linguists- in defining grammatical systems. Halliday states that "if a new term 
is added to the system, this changes the meaning of all the* others" (1961:247). 
Kenneth Pike defines system as a group of two or more units which enter into each 
others' definitions (Pike and Pike 1977:139). For dialect systems this principals 
realized in at least two ways. The?e may be summarized Id the observation that (1) 
the, measurement af distance is relative to the system, and" (2) the learning effect 
of contact is cumulative over the whole system. f 

We are used ta measuring distance in absolute terms. For instance, in 
measuring geographic distance we use absolute, universal units such as miles or 
meters. In measuring linguistic, distance we might use a standard measure such as 
percentage of lexical forms which are noncognate. These absolute measures are 
helpful when the observer stands outside the system and attempts to measure 
distance. However, when the observer stands inside the system as though he were a 
participant, the perception of distanpe begins, to become relative. That is, the 
distance from one jitoint to another is perceived in relation to the distance of that 
point to all otliers In the system* This distinction between the perspective of one 
standing oXitside the system and that of one looking from 'the inside as a participant 
is like Pike's distinction of the etic and emic perspectives (Pike I967:37ff, Pike 
and Pike 1 977 : U83 ) . For instance, when growing up in California I gained a view^ of 
United States geography in which Chicago is situated midway between the East and 
West coasts. However, when measured against an absolute" scale* Chicago is three 
times further from San Francisco than it is from Washington, C. As another 
example, Americans generally perceive South America as being located directly south 
of. the United States. However, when measured against the absolute scale of global 
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longitude, the whole coastline, of Chile turns, out to be (fast of Washington, D. C 

. . The data from Santa Cru2 Island (see Sections 6.3.4. and 6.3*5) g'ive clear 
evidence ^that distance measured with reference to perspective from within a dialect 
system probities a much better explanation \>f communication than distance which is 
measured by an^fcsolute scale and makes no, reference Xo the system. 
» , 
Figure 16.1- gives a map of Santa Cruz Island illustrating this po^nt. The map 
illustrates the measurement of distance relative to two villages „ Mbanua (BAN) and 
. Nanggu (NNG). Mbanua is the geographic center for this dialect system. It is 
defined as such because if is, on average, nearer to all the other dialects in the 
system than any other dialect. The loop which surrounds Mbanua is drawn at a radius 
equal to the average distance from MbanHa to all other -speech communities in the^ 
dialect system. This distance is 180 minutes, or 3 hours, of traveling time. One 
way of interpreting this loop is that from the standpoint 6*f Mbanua, half of the 
people on- ' the island live within the loop, the other half live outside of^t. 
Nanggu, on the other hand, is the most peripheral village in .>ithe system. The 
average ^stance from Nanggu to all other dialects is higher than that, for. any other 
community. Again,- an arc is circumscribed around Nanggu at a radius equal to the 
average distance to all other dialeQfca. This distance is 588 minutes, or nearly 10 
hours, of traveling time, J 1 

■>!f: ■• • 

The map in Figure 6.1 sftould illustrate the relative nature of distance *on 
Santa Cruz Island. A traveling distance of 180 minutes is a complelely different 
thing from the perspective of Nanggu than it 'is from the perspective of Mbanua. A 
villager from Mbanua can meet half of the people on the island by traveling 180 
minutes from home. However, if. a villager from Nanggu travels 180 minutes he has 
barely left home and still has a long way to go to meet anyone outside of the 
neighboring village of Mbimba. My. hypothesis is that for using distance to predict 
interaction, the average distance to ail other dialects in the system, that is, 180 
minutes for Mbanua or 588 minutes for Nanggu, are roughly equivalent distances from 
the perspective of the respective villages. 

he, hypothesis receives Support from the analysis^ In Section 6.3.4 where it is 
shown that geographic distance measured by an absolute scale explains 
Intelligibility with 67% aocjfrlcy while geographic distance measured by a relative 
method explains intelligibility with 83$ accuracy. Stated in another way, absolute 
distance explains intelligibility with 33* error while relative distance explains 
intelligibility with 17* error, 6nly half as much. 
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The* second aspect of interdependence is 1 seen in the way that the learning 
effects of contact are cumulative. - When trying t'o.. explain communication, the degree 
to whioh one dialect understands the speeoh of another does not depend simply on its 
contact with that dialect. It depends also on the first dialect's contact with all 
other dialects. This is because those dialects also bear some similarity to the 
original target dialect. Therefore, the effects of contact with all other dialects 
has a contribution to learning about the speech of another dialect. 

An example^ of this principle can also be seen from the data of Santa Cruz 
Island. In figure 6 .2 'another map of the island is reproduced. In this case the 
lexical similarity, between four of the dialects is indicated — Neo (NE0), Lwowa 
(LWO), Mbanua (BAN),, and Nooli (N00) .„ The lexical similarity betweefo Nooli and Neo 
is only 59* . %%% * a well below the level of similarity for which we normally 
expect full ufitfer$t*rj$ing. SWadesh suggested that 81* similarity correlated with 
the lower limit o^f^i. intaUlgibiliity^ many investigators in Papua New Guinea have" 

1/*'' •/• -••■ . • 
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Figuce 6.2 Indirect contact on Santa*Cruz Island 
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considered 70% 'or 75% more realistic for that (MoElhanon 1971:13^-5). (Note that 
the simplified/ linear model in Figure 5-9 of Section 5.2.6 indicates an expected 
intelligibility of 32* when lexical similarity is 59*.) However, intelligibility 
tests shotfed that Nooli had full ur^rstanding of the speech of Neo; The parallel 
data which -were collected on interact'iori relations indicate no relations of direct 
contact between the two village^. On the basis of thj.s fact alone, we would then 
expect that Nooli* s contact with Neo would not be * sufficient to boost 
intelligibility to the full Level. , It is the interdependence principle which 
explains the presence of intelligibility. The data on interactions do Indicate that 
Nooli has contact with Lwowa and Mbanua, Thi^^itfien, accounts for the full 
intelligibility measured for Nooli on the speech of Lwowa, Note now, that Neo is 
87% similar to Lwowa and 85% similar to Mbanua, This indicates- that Nooli, through 
its. contact with Lwowa and Mbanua (as well as the other dialects in the same 
vicinity), has learned up to 87% of the speech off Neo, ' Thus it is conoeivable that' 
without any contact Whatsoever with Neo they would be able to understand that 
dialect at a level • predicted by 87% lexical similarity. Actually, Neo shows some 
similarity to all other dialects with which Nooli has contact. Thus, through 
learning to understand differences from other dialects, they have at the same time 
learned to understand many aspects of the speech spoken at Neo, Therefore, the 
absence of direct contact between two speech communities is not sufficient evidence 
to discount learning between the dialects. The "^earning of another dialect is 
actually a function of contact with all other dialects and thj9 similarity of those 
dialects with the target dialect,* Gillian Sankoff observes this same phenomenon 
among the Buang of Papua New Guinea ( 1 968 : 18U ; 1969:848), , 

6,1,4 Ccftnmon dependence on a center 

A fourth characteristic of d^aleot^eys terns is that the relations betwe^iv^ 
dialects are not random; they are subject* to Ihe common influence of a center. The 
solar System gives a good example <5f this property. The motion of the planets can 
be understood only in terms of the common gravitational pull of the sun. The motion 
of a moon within^the solar system can be partly explained through the force exerted 
by the sun but requires the introduction of a second force, the gravitational pull 
of the host planet, to explain the small orbits which are superimposed on the huge 
orbit around the sun. 

* * r. 

Dialect systems, too, are characterized by these common and central forces 
which explain the overall patttern of interaction, V* Section 6.12 the imba^anced 
pattern of interaction between two dialects was discussed. When many such pairwise 
relations are viewed simultaneously, then an overall pattern of a single dialect 
dominating interaction with thef surrounding dialects may be seen. This kind of 
dominance (attraction) defines centers, and thus dialect systems are viewed as 
centered systems. The center within a dialect system is the primary force in 
"explaining patterns of communication within that system, kz in the solar system, 
there may also be secondary (or even^ tertiary , and so on) centers. These subsidary 
centers would be used' in addition to the primary center to ekplain relations in a 
specific subsystem of the whole. * 

A center is defined by recourse to a number of factors. For a given system, 
the communication center would be the' dialect most widely understbod. The 
linguistic center would be the dialect having the highest average linguistic 
similarity to all other dialects. The geographio center would be the dialect having 
the lowest average distance to all other dialects. The demographic center would be 
the dialect having the greatest population. The center with respect to community 
facilities would be the dialect hcfying the greatest collection of facilities. The 



1 15 



center for cultural associations would be the dialect attracting the greatest number 
of married couples, attracting the greatest number of people to ceremonials, having 
the greatest concentration of traditional wealth, 1 or having the greatest polit^oal 
power. All of these factors contribute to defining a central dialect. An examples 
Is given in Section '6.3-2 where the central dialect for Santa Cruz, Island . Is 
determined. For three other examples of defining a central dialect and a discussion 
of the general topic, see the paper by Joy Sanders (1977).' ,In Section 6.3.5 (Figure 
6.13) the hypothesis that relations to the central dialect define relations over the 
whole system Is tested. The predicting models thus derived, are 88> accurate on 
average . / 

6.2 A general model for explaining communication - 

* 

In this section a general model for explaining communication is developed; In 
Section 6"3 it is tested with data from Santa Cruz Island. Many possibilities' for 
filling in the general model are suggested; at this point it is too soon to propose 
which are best. Therefore, in Section 6.3 many of the proposals are tested against 
the ..field data and. the results ire reported. These serve to indicate the potential 
opPthe general model. \ 

The model is developed in twp parts. , First, Section 6.2.1 discusses £ he 
relationship between linguistic similarity and contact In predicting 
intelligibility. A model involving those three variables alone is given. * ' Second, 
Section 6.2.2 concentrates on the contact variable and develops a model for 
predicting values of the contact factor-to use in the main fdrmula. <■ 

6.2.1 Predicting intelligibility 

The basic model suggested In Section 4,3 states that intelligibility has two 
components, a similarity-based component and a contact-based component. Another way, 
of .saying the same thing is that intelligibility is based on both linguistic factors- 
and social factors. In mathematical terms, one would say that intelligibility is a 
function of linguistic similarity and contact. That is, 

4 ■ 

I = f(L,C) 

where I = intelligibility, 
* - L = linguistic similarity, and 

C x contact 

The goal of this section is to specify the manner in ' which these two variables 
Interact td explain -Intelligibility . 

All^previous attempts to specify a model for intelligibility have suggested 
that the function relating linguistlo similarity and oontact Is an additive one 
(Casad 1974:191, Stoltzfus y 197U:U6 , Collier 1977:256). That is, 

I s f(L) + g(C) '4 

This model states that intelligibility Is equal to the effect of Mhgudstio 
similarity, or the effect of contact, ,or the sum of both. When there is no contact, 
the £ factpr Is zero and intelligibility is based striotly on linguistic similarity. 
When there is no similarity, the L factor is zero, and intelligibility is based 
strictly on contact. When the"re is both similarity and contact, intelligibility is 
the sum of their effects. When'contaot is favorable, the effect will be a boost in 
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intelligibility above the level predicted by linguistic similarity alone, When 
oontaot la not favorable, the £ factor oould have a hegative value which would have 
the^ effect of limiting intelligibility to a level lower than that expected on the 
basis ,of similarity. (This mod^l therefore accounts for the oases reported by Wolff 
1959.) r * • v . ' t - * 

This model has the disadvantage that it puts no oeiling^on the possible effect 
of oontaot. • If the similarity and contact Taotora were both high, a percentage of 
intelligibility beyond 100% would, probably be .predicted. This ia beoauae { fch<$ .model 
specifies no Interaction between the variables; a given degree of contact increases 
intelligibility by the same amount regardless of the degree of similarity, This 
cannot actually be the case, however. For instance, '^aaume that 75% .similarity 
predicta 50% intelligibility when there ia no oontaot and that & amount of oontaot 
raiaea intelligibility by 40% to 90%. But auppoae thai 90% aimilarlty predicts 80% 
intelligibility when there ia no contact, then & amount of contact cannot raiae 
intelligibility by another 40%. By definition, the degree of intelligibility cannot 
exceed 100%. Therefore f the degree of underatanding which a oertai^apount of 
•contact brings about must be restricted by the amount of improvement which is still 
possible. r f ^ 

■ * 

This refinement to the model can be formulated as follows: 

I s f(L) + g(COOO-L) ) - 

Here linguistic aimilarity . ia measured as a percentage. The value (100 - L) then 
gives the percentage of non-similarity. This model suggests that the learning (and. 
thus intelligibility boost) brought about by the contact factor *ls_ limited to that 
portion of the language which is n^t^already similar. 

In order to use least-squares regression techniques' to test' the model, the 
model would be reformulated as follows: 

I * b Q + but, * b 2 C(100-L) 

r W ' • ' 

Multiple regfessipn analysis would then yield values for the three £ constants in 

the formula. " * >i * * " . 

^ . . % - ■ 

The techniques of least-squares regression analysis are not appropriate" for the 

data from Santa Cruz Island because step funtions rather than continuous functions 

are required to predio't intelligibility (see Section 6.3 .3) . Therefore, another 

formulation of the basic model is tested in Section 6.3. It is as follows: 

F = L ♦ COQ0-L) 
• I s f(F) 

In the first place, linguistio similarity and'oonJtect are combined directly (with no 
weighting factors or additive constants) to pred/ct the "linguistic 'Cami liar lty" , or 
£, The familiarity is a percentage estimate* of /what port ion ..'pf the dialect • of the, 
speaker is familiar to the hearer, either' through^ similarity or contact or both. '' In- 
order to prevent £ from exceeding 100>, thai faotor must be soaled to a range of 
zero to one. As long as L. ranges from 0% to 10011 and £ ranges- frotp 1 aserp to one^ £ 
will . . range from 0* to 100$. Intelligibility is then predicted as a funotion of 
familiarity. In Section 6.3, step functions are used Jo predict intelligibility 
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from familiarity. 

6,2,2 Predicting interaction and contact 

Contact can be measured by some of the techniques suggested in Section 6 . 1 .2.3 
. and substituted straight into the above, formulas; this is done i*i Section 6,3.5. K 
A complementary method is to predict contact; this is done in Section 6,3.5.2. The 
prediction can be based on some of the factors underlying contact, such as distance 
between speech communities, population, relative distance from the center, and so 
on. The advantage of predicting from these factors is that they can "b4 measured 
from maps and ^census data before going ( lnto the field. Predictions can also be 
based on overall attraction* and motivation relations computed from the raw da^a 
(Section 6.1.2.3). When predictions are^ based on general patterns observed over the 
whole system, the estimated values may turn out to bm better that} the observed ones 
for at least two reasons: (1) th* predicted values may afford a more refined 
measurement if the original scale had only two *>r three discrete levels, and (2) the 
predicted values may smooth over gross measurement errors in the raw data. The 
Santa Cruz Island data give evidence that contact predicted on the basis of overall* 

attraction and motivation relations is a better predictor of intelligibility than 
the raw pairwise contact measurements (Section 6.3.5). 

& \ 

Models for predicting interaction and contact are not new to social* science. 
They have befcn used by sociologists to explain human interaction for nearly one 
hundred years. Gerald Carroth^rs (1956) gives an extensive historical, review of 
what are probably the most promising models, "gravitational" models. Quoting from 
Carrothers (1956:94) : / , ^ 

In general terms, the gravity concept of human interaction postulates 
that an attracting force of interaction between two areas of human 
activity is created by the population masses of the two areas, and a~ 
friction against interaction is caused by the intervening space over which 
the interaction must take place. That is, interaction between the twcf 
centers of population concentration varies directly with some function of 
the population size of the tyfo centers and inversely with some function of 
the distance between them. * 

This is, of course, "nothing more than an analogy to Newton's law of universal 
Cavitation. The direct analogy to Newton's law is stated mathematically as follows 
(Carrothers 1959:95): 
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Following the' analogy from physics, the energy of interaction between the tv^o 
communities would have 'given by multiplying the foroe times , the distance. The 
result would have simply the distanoe, rather than the distance squared, in the 
denominator, £or the distanoe term, not only geographlo distanoe but also 
Linguistic distance could be use<| to prediot interaction. This would *be based on an 
assumption that the greater the linguistic difference between dialects, the less the 
interaction that might be expected, ' ' v 

The above formula predicts reciprocal interaction. Of more interest to us than 
reciprocal interaction is one-way contact. In the, sociological analogy to physios 
the/e is such -a measure. It is termed "potent^L of population" (based on an 
analogy t*r potential energy) and was devel"ed by Stewart (1941, Carrothers 
?56:96h It suggests that the potential for interaction of an individual at X with 
^the population of community J. f wi,Ll be greater as the population of J is greater, 
ind will be less as the distance between ± and J. increases. Stated mathematically, 
the prediction of contact in these terms would be as follows: 



where, ^ =* the potential contact of j. with 

speech community 
Pj » the population of community j[; and' 
» the distance' separating ^i and 



c In this formulation, the population of J. serves essentially ks t a measure or the 
attraction of J.. The assumptioh is, the larger J, is, the more likely it is to 
attract, contact. If we rewrite the formula to replace population with attraction,, 
the result. is a more general model which can have wider^ application in predicting 
language contact, . ' 

Another factor can be added ,to the model; this is the motivating of group X to 
have contact.- The contact of JL with the speech of community J. does not depend 
solely ^ on J. f s attraction, but also on i's motivation to interact. Some communities 
may be eager and outgoing; others may be cautious or reclusive. 

Another refinement can be made to the mo4el. When measuring the contact of one 
group' with another, the distanoe between them can be measured relatively from t;he 
perspective of the group which is making the contact, rather than in absolute terms. 
The refined model for prediqting contact is therefore, 

■ i 
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where, ^Cj - the contact of community l_ with the 

speech of community ^; 

■ the attraction of community 

m the motivation of community i; and 

- the distance from 1 to j_ as * 

perceived by _i. y 

. \ 

There are many possibilities Tor speoific faotors to plug into the model. 
Distanoe could be geographic or linguistic, or a combination of both. The 
following factors could be used to estimate the attraction of* a g£oup: population 
size, population density, nearness to the center of the dialeot system, * or the 
number of community "facilities located within that group.' Motivation -could be 
estimated as the inverse of any of the above factors. Thft is, as the population of 
Ihe group increases, we might expect its motivation to make contacts to deorease. 
As a group is nearer to the center of 6he dialeot system, we might expeot its 
motivation to make contacts to diminish, and so on. Another source of estimates for 
attraction and motivation are the .sums of the rows and columns of the raw data 
matrioes (Section 6.1.2.3). • It was already suggested that the sums of the rows and 
colums reflect the relative motivation and attraction of the dialeots within the 
whole dialeot system. 

Another possible perspective on attraction and motivation surfaoed in the 
disoussioiv of patterns of interaction (Section 6.1 .2.2) . There it was suggested 
that' contaot relations are the result of at least two factors, a need faotor and an 
attitude factor. The degree to which ± attracts J. oould be measured in terms of J.'s 
need to interact with 1, and J.»s motivation to interact could be measured in terms 
of its attitude toward ±. 

> 

The possibilities are numerous and at this point no dogmas concerning the best 
approach can be suggested. In the next seotion of this chapter some of the above 
proposals are tested against field data. These results serve to lndioate the 
potential of the general model. 
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6.3 Explaining communication on Santa Cruz Island 

The models developed in Seotion 6*2 are now used to explain \oommunioation1 on 
Santa Cruz Island in the Solomon Islands, The island is about 30 miles from end to 
end and has a total population of around 3000. In 1977 *I conducted a diaieot 
Intelligibility survey there following the method desoribed in Section 2.1. At the 
same t*ime, Richard Buchan conducted a lexicostatistio survey. The results of blpth 
are reported in Simons 1977a . ■ 

I 

The organization of this study is in seven pacts. Seotion 6.3-1 reviews tjhe 
data. Seotion % 6. 3*2 is an analysis which locates the center of the Santa Cnuz 
tfialect system. Seotioti 6.3*3 reviews the statistical methods used to evaluate the 
models for explaining communication. Then in Sections * 6.3. M and 6,3.5 naipy 
different models for explaining oommuriioatioh are proposed and tested. Section 
6 t .3.6 summarizes^ the successive refinements which were achieved at different stage 
in the modeling process. Finally, Seotion 6.3.7 draws conclusions, both of h 
specific nature for explaining communication on Santa Cruz Island and of a genera} 
nature for explaining communication elsewhere In the world. 

6,3.1 The data 

A complete description of the data for 'this study and all the data tables are 
found in Appendix 2.1. At this point I give only a brief review of what the data 
contain. The numbers in parentheses refer to the Appendix in which the details are 
found. 

Intelligibility was tested in thirteen villages during the dialect survey. 
These thirteen villages represent the main dialects on Santa Cruz Island. s In the 
first place, three items of information about each o£ the thirteen dialects are 
given: the villages which comprise them (2.1.1), the population of the dialects 
(2.1.2), and the density of population of the dialects (2.1.4). The remaining data 
consist of t pairwise measurements of relationship between the dialects. The^e 
measurements include: the geographic 1 distance between the dialects measured as 
traveling time (2.1.3), the lexical similarity between the dialects measured as 
cognate percentages (2.1.5), the lexical distance between 'the dialects measured as 
percentages of non-cognates (2.1.6), the intelligibility between dialects measured 

as described in Seotion 2.1 (2.1.7), local opinions about intelligibility (2.1,8), 
the contact of dialects through yearly church festival^ (2.1. "9), and the contact of 
dialects through marriage ties 12.1.10). Geographic and lexical distance are 
measured relatively as well as absolutely. In Appendix y 2 . 1 . 1 1 a complete matrix of 
estimated intelligibility is given. The estimations are based on , the f^al 
predicting model developed in Section 6.5. 

6.3 »2 The oehter of thi Santa Cruz dialect system 

Th* oentra^ity of a dialect can be measured in several ways. It could be 
geographically central. It could be a center of population. It could be 
linguistically or culturally central. All these .factors must be considered in 
defining a center. Ideally the evidence from' many aspects, of oentrality will 
Converge on a single answer. For Santa CrUz Island it does. 

The center for the Santa Cruz dialect system is Mbanua (BAN). The evidence is 
summarized in the following list in which the first and second most central dialects 
are listed for each of the kinds of data presented in .Appendix 2.1. 
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The following oriteria for defining oej&era were uaed: for population and denaity, 
the greateat population and denaity; for geographio and lexioal diatanoe, the loweat 
average diatanoe; 4 for opinions, festi^ala, and marriage, the greateat attraction. 

The evidence olearly pointa toAbanua aa the central dialeot . A further bit of 
evidence whioh haa not yet beea/ mentioned alao oonoura. Thia ia the evidenoe of 
oommunity facilities. Moat of tAe . dialeota are aelf auffioient in terma of 
ohurohea, stores, and/ primary /sohools. . The one facility whioh influenoea 
interaction on an island-wide »o/le ia ttoe adminiatrative headquartera for the whole 
Eaatern Outer Islands oounoil area' of the Solomon Islands. At thia- headquartera are 
looated the only hospital, government off ioea, poat offioe-, polioe station, and 
airstrip for the ia^'and. The/hajor ship wharf is there aa well. This institutional 
qenter ia looated midway between Mbanua and -Lwowa. 

In Section 6.5 when ^models whioh prediot oontaot on the basis of diatanoe from 
the oenter are tested, th# data uaed are the diatanoea of the dialeota from Mbanua. 
Theae are found in A poind ix 2.1 by taking the BAN oolumna of Table 2.1 (geographio 
dlstanoe) and, Table 2. ^(lexioal diatanoe). 

6.3.3 The statistic*/ method. 

Unfortunately f the atatistioal methoda used in the analysis of lexioal 
similarity and intelligibility (Chapter 5) are not applicable to theae data. Thia 
ia because intelligibility was meaaured on a diaorete point aoale, rather than on a 
oontinuoua percentage soale. Thi-s results in two reaaona why the least-squares 
methoda of oprrfllation and regreaaion analyaia uaed previously are not applicable to 
the current an/lyaia: (1) the teohniquea are | not ^ appropriate for ordinal aoale 
variablea, a/d (2) the funotiona which predict intelligibility are step funotions 
rather thanylinear funotiona, , ■ 

StynTaticiana diatinguiah between ordinal level of measurement and Interval 
level^f measurement (Stevens 1946). When a variable is meaaured on an ordinal 
seal**; eaoh oategory has a unique poaition with respect to the other oategoriea. 
Ttaax ia, it is higher in t*lue than some oategoriea and lower in vtflue than the 
jr99t. However, ordering ia the* only mathematical property of auoh a measurement 
'■. jp aoale; relative diatanoe between oategoriea .ia undefined. On the other hand, when a 
/ variable la meaaured on an interval aoale an additional property characterizes the 
/ meaaurementa. Not only are the 1 oategoriea ordered; the diatanoea between the 
I" oategoriea are defined in terms of fixed and equal units. Of the data desoribed in 
I Seotion 6.3.1, half of the variablea (Intelligibility, opinions, ohuroh festival 
\ attendance, and marriage tiea) are measured on an ordinal scale. The techniques of 
\ correlation and regreaaion analysis used in-Chapter. 5 require that the variablea be 
meaaured on an interval/ aoale. 



Date 



Population 
Denaity 

Oeographio dlstanoe 
Lexioal diatanoe 
* Intelligibility opinions 
Festival attendanoe 
Marriage reaidenoe 
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In a linear function some amount ofiChange in the independent varfiabls always 
results In a pnpportionpl amount of change in the dependent variable. In a atep 
function, change in the independent variable does not alwaya reault in change in the 
dependent variable. Rather, the dependent variable holda a oonatant value for 
apeoifio rangea of the independent variable, and when the dependent variable changes 
it doea ao suddenly rather than gradually. When a atep function- ia plotted on a 
graph, the appearance ia that oh a flight of stairs. Figure 6.3 illustrates, a 
linear function and a atep function. • The teohriiquea of correlation and regression 
analysis used in Chapter 5 *t>ply when the underlying function is a linear on?. In 
the current data, since intelligibility ia measured on a four-point sfiale, functions 
which predict intelligibility will be step functions. Even the non-parametric 
correlation techniques (such as Spearman or Kendall rank-orcjer correlations) which 

require no assumptions of linearity or interval scale variables do not handle step 
functions. With these techniques > a perfeot step function yields less than perfect 
correlation. 

For the two reasons Just (presented, it was necessary to use different methods 
of evaluating models for predicting intelligibility on Santa Cruz Island. In the 
previous chapter, the percentage of explained variation was the main measure used td 
evaluate the adequacy of a model. Here, the percentag e of- prediction frQQijrftoy ia 
used. This percentage is based* on the ratio of prediction accuracy.* In the 
simplest case, this ratio would l?e obtained by dividing the number of correct 
predictions by the total, number 'of predictions, which equals the number of oases. 

However, in this formulation Ho account is made for 7 how far off the incorrect 
predictions are. It would be good to distinguish between models with the same ratio 
of incorrect predictions but inv which the errors are small in one model and large in 
another. The rationale here is that If administrative decisions had to be based on 
one of the two models, it would be better to use the one with the smaller errors. 
To do this, the number Jflfrcorrect predictions is decremented for predictions which 
are very wrong, *In wie^anta Cruz study understanding is at one of three' levels: 
full intelligibility, partial intelligibility, or sporadic recognition. When a 
prediction is incorrect it^ can be off by one level or at moat by two levels. When a 
prediction is off by two levels, one is subtracted from the number of correct 

predictions. Thus the ratio of prediction accuracy becomes, 

■ » » 

ratio of prediction accuracy a 

correct predictions - predictions off bv two levels 

total predictions 

• * ■ 

The same relationship can be formulated in another way using the concept of 
deviations. When a prediction is correct, the deviation from the measured value is 
zero; when it is off by one level, the deviation is one; when it is off by two 
leveli, the deviation is two. 1 The following formulation is therefore equivalent to 
the preceding one: 

ratio of prediction accuracy x q ^ 

• total predictions - a urn of deviations - 

total predictions 

The percentage of prediction accuracy is then obtained by multiplying the ratio by 
orte hundred. f 
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Figure 6.3 A,linear function and a step function 
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In the previous chapler , the computational method of least squares was used to 
find the regression line and thus define the parameters of the\inodelA In this 
analysis, sinoe those techniques are not appropriate, the step function which best 
fits the data is found by inspection. First , a soattergram of the data^jV plotted? 
Then the cutoff points for the steps are located such that the percentage of 
prediction accuracy is maximized by minimizing th# sum of the deviations. After 
fitting the step function, the sum of the deviations' is totaled and the ratio and 
percentage of prediction accuracy are commuted. The many soattergrams in Appendix 
2.2 illustrate the technique. 

Tests of significance are used to compare the accuracy of different models. 
The exact ratios of prediction accuracy are compared with a two-by-two chi-square 
test. Two types of tests are used, A two-tailed test is used to test the 
hypothesis that two ratios are unequal when there is no reason to suspect whioh one 
should be greater. A one-taile£-t$st is used to tesjt the hypothesis that one 
particular ratio is sigptfioantly greater than or less than another one. 
(One-tailed tests are^alwa^s used unless specifically stated otherwise,) The result 
of the one- or two-tailed ohi-square test is a significance level. The significance 
level is the probability that the two ratios are actually equal and that the 
observed difference could be due to chance alone. If this probability is very low, 
then we feel safe in accepting the hypothesis that the ratios are actually different 
(and that one is greater than the other in the case of a one-tailed teat). Social 
scientists generally agree on the ,05 level as being significant. When accepting a 
hypothesis at the .05 level, one Is saying that there is no more than a one in 
twenty chance that it is wrong. Jtn the discussions which follow, differences at 
fehe .05 level or better will be called "significant". Differences at the .01 level 
or better will be called "very significant". Differences at the .001 level or 
better will be called "highly significant". Occasionally differences Just over 
£he .05' level will be referred to as "nearly significant". 

6.3- 1 * Single variable models 

In Appendix 2.2 the soattergrams and the best fife step functions for single 
variable predictors of intelligibility are given. Six single variable predictors 
are tried: geographic distance, lexical similarity, opinions about intelligibility, 
church festival attendance, marriage ties, and predicted marriage residence. The 
accuracy of these single variable predictors is summarized in Figure 6.H. In this 
table, 'the percentage of prediction accuracy for the single variable predicates is 
given in the "With model" column, in parentheses, following the percentages, are 
the. exact ratios of prediction accuracy. This format is followed in all remaining 
tables: percentages followed by the exact ratios. In the first column of numbers, 
in the table, the percentile of prediction accuracy for the worst case model is' 
given. This is the minimum percentage of accuracy that would be obtained if the 
relationship between intelligibility and the predictor wire due to chance alone and 
not to any correlation between the two. In every case, the worst case ocours if the 
model prediots full intelligibility for all values of the independent value. This 
is because there are many more oases of full intelligibility than of the other £wo 
levels. Sinoe the strategy in finding the best fit step ^function is to maximifc*^ 
prediction aooouraoy, it can never be ftorse than what would be given by predicting 
only full intelligibility. 

Note that in evei*y case, the prediction accuracy for the model is greater than 
* the worst case. The final column of the table gives significance levels for tests 
on the hypothesis that the single variable models are significantly better than 
ohanoe associations (that is significantly better than the worst case). Three of 




Figure 6.4 Single variable predictors' of intelligibility 



Predictor 

* 

ueograpnic distance 


Chance alone 
(worst case) 


With 

• 


model 


Significance 


. 56% 


(44/78) 


67% 


(52/78) 


.09 


Lexical similarity 

• 


56% 


(44/78) 


'77% 


(60/78) 


.003 


Opinions 


56% 


(44/78) . 


77% 


(60/78) 


* 

.003 


Festival attendance 
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the er-ediotora are 7751 to 79* acourate and are definitely* aignifioant: lexioal 
aimilarity, opinions, and festival attendanoe. Two are definitely not significant: 
marriage ties and marriage reaidenoe. The sixth prediotor, geographio diatanoe, is 
in a middle range where it ia nearly aignifioant. In the remaining figures in this 
ohapter, aignifioanoe with reapeot to the worst oaae ia not oomputed .in the tables 
because in every, case the results are aignifioant. An aoouraoy of 69* la 
aignifloantly greater than the worat oaae of 56* at the .05 level (oomputed on 
• ratios with a denominator of 78). > 

When the relations of lexioal and geographio distanoe are made nonaymmetrio by 
oonsidering distanoe relative to. the system, the result is a signifioant 
improvement in prediction aoouraoy over the symmetric measure of absolute distance 
in Figure 6.4. This is shown in Figure 6.5. The oolumn of significance figures 
shows that the increase from 67* to 83* prediction aoouraoy for geographio diatanoe 
ia very aignifioant. The inorease from 77* to 86* for lexioal distanoe ia only 
-nearly signifioant at * the .07 level. However, the overall effect of measuring 
distanoe as relative rather than absolute (whioh is obtained by summing the results 
for geographio and lexioal distanoe) ia an inorease from 72* to 85* whioh is highly 
aignifioant. 

Inspection of the aoattergrams-and step functions for relative lexioal and 
geographio distanoe (see the last three scattergrams in Appendix 2.2) shows that 
relative geographio diatanoe is a better prediotor of intelligibility in the low* 
intelligibility range while relative lexioal distanoe is a better prediotor in the 
high intelligibility range. That ia, the greatest number; of inoorreot predictions 
for -relative geographio distanoe are underestimates when^measured intelligibility is 
"full intelligibility" and the greatest number of inoorreot predictions for relative 
lexical distanoe are overestimates when measured intelligibility is "sporadlo 
recognition". Since the strengths and weaknesses of the two different models are 
complementary, it follows that a combination of the two predicting variables might 
balanoe the weaknesses and yield a better prediction. This is indeed the oase. An 
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Figure -6.5 Absolute, relative, and composite distance 



Absolute Relative Composite 

Geographic distance 67% (52/78) 83% (65/78) 90% (70/78) 
Lexical distance . 77% (60/78) 86%* (67/78) 90% (70/78) 



Overall 72% (112/156) 85% (132/156) 90% (70/78) 



Significance of : 



Relative > tompoJH^e > Composite > 

absolute - relative absolute 



Geographic distance .008 N .12 .0002 

Lexical distance .07 .23 .02 

Overall .003 .14 .001 



ojJtimml combination of the two relative measures of distance would be one which 
maximizes the prediction aoouraoy of the new oomposite variable. By, iterating over 
various weightings of the two variables at steps of one-hundredth, it was found that 
the optimal combination is a combination consisting of 40? relative lexloal distance 
and 60% relative geographlo distance. That is, 



Composite relative distanoe ■ 

.4 X relative lexical distance 
•6 X relative geographlo distance 

Figure 6.5 shows that oomposite relative distanoe predicts intelligibility with 
ah aoouraoy of 90ft. This is an increase above 83% for relative geographlo distanoe 

and 86$ for relative lexioal distance* However, tests of significance' show that the 
, else of these increases is not ^significant. The total improvement from the 
Individual measures of absolute distanoe, however, ;to oomposite distanoe prove to be 
. very significant for lexioal distanoe and highly significant for geographic 
distanoe. 

* * 
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6.3.5 Complex models including linguistic similarity and contact 

In the preceding section, simple models were considered in . whioh 
intelligibility was viewed as a function of one factor, either linguistic similarity 
or contact. In thiyseotion , complex models in whioh intelligibility is viewed as a 
function of both similarity and contact are considered. The basic model has already 
been introduced in Section 6.2.1. It is as follows, 

F « L ♦ COOO-L) < 
I » f(F) 

First, linguistic similarity and contact combine directly to prediot the 
"linguistic familiarity", or £. The familiarity is a percentage estimate of what 
portion of the dialect of the speaker is familiar to the hearer, either through 
similarity or contact or both. Second, intelligibility is predicted from the 
familiarity. The function whioh maps familiarity onxo intelligibility is a step 
function. When an actual model is specified , it is necessary to make the funotion 
explicit by stating the ranges of £ for each of the values of X. 

Only one measure of linguistic similarity, ^, is available in the present data. 
That is lexical similarity expressed as a percentage of cognates in basio 
vocabulary. This measure will be used for linguistic similarity in all 
formulations. 

The object of the investigation in this section is to explore different 
variables whioh can be substituted for the contact factor, in the familiarity 

formula. First, measured values of contact are used. Then predicted values are 
tried. The predicted values are based on the attraction and motivation model 
presented in Section 6.2.2. Attraction and motivation are estimated first from 
measured oontaot, then from population relations, and finally from distahoe to the 
center of the dialect system. 

In order to insure that the linguistic, familiarity does not exceed 10051, the 
oontaot factor must be limited to a range of zero to one. For insta.noe, measured 
oontaot through church festivals has the range zero to two. To adjust this variable 
for inolusion in the familiarity formula, the values need to be divided by two. 

The situation\for a variable like geographio distance is not as simple. The 
values of that variable measured in traveling time range from 5 minutes to 830 
minutes. Furthermore, in making the adjustment the values must be inverted; that 
is, a high value of geographic distance implies low oontaot, x and a low dtstanoe 
implies high oontaot. / 

Tne method used is this. First examine the distribution of values to determine 
the desired "minimum" and "maximum" values. These are not the true minimum and 
maximum; rather, they are the value whioh is to adjust to zero, and the one whioh is 
to adjust to one, respectively. For geographio distanoe, 830 is the minimum value.. 
For the ' maximum value, 105 was selected on the following basis: on the island, it 
was observed that when neighboring dialect groups were within 105 minutes of 
traveling time, 'contact was always so great as to give the appearance of oomplete 
familiarity. The adjustment is made using this formula: 

adjusted value, a (original value - min) / (max min) 
Then if the adjusted value exceeds one, it is set to one. If it is less than zero, 
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ijb is sat to zero, unless negative values are used to refleot negative ^attitudes 
(they are not in this study). The minimum and maximum values *for the scaling of all 
contact variables are reported in Appendix 2.3. 

6 Measured contact < 

Five different kinds of measured contact are substituted Into the familiarity 
formula to predict intelligibility. The aoattergrams, best-fit step functions, and 
statistics for these five measures of oontaot are found in Appendix 2.4. The 
percentages of aocuracy for the five predicting models are summarized in Figure 6.6. 

In the\top half of Figure 6.6,, the hypothesis that prediotions based on 
similarity and oontaot are more accurate than predictions based on contact alone is 
tested. The ^irst oolunin of numbers gives the prediction accuracies for models with 
contact alone; these are copied from Figure 6.4. The next column gives the 
prediction Accuracies for oomplex models which combine the given oontaot factor 
with lexical similarity. The final column gives results of the significance tests 
on the hypothesis that the oomplex models are more accurate than the simple ones. 
In all five oases the complex model has the higher prediction accuracy. In three 
oases — geographio distance, marriage tie f s, , and marriage residence — the 
improvement is significant. The overall effect, obtained by combining all the 
ratios In the columns, is a highly significant increase in prediction accuracy. 

The bottom half of Figure 6-6 tests the ^hypothesis that the oomplex models 
combining similarity and oontaot give better predictions than the model based on 
similarity alsne. The first oolumn of numbers gives the prediction aooviraoy for the 
similarity model from Figure 6.4. The second column gives the prediction accuracies 
for the oomplex models. The final oolumn gives results of the significance tests on 
the hypothesis that the oomplex models are more accurate than the similarity model. 
In only three of the five oases is there *ny increase in accuracy, and this is never 
significant. The overall effeot, as well, shows no significant improvement of the 
oomplex models based on measured oontaot and similarity over the simple i similarity 
model. 

6.3*5.2 Predicted oontaot * 

Predioted oontaot is oaloulated for Sevan different faotors. The first three 
are based on overall attraotion and motivation measures for opinions about 
intelligibility, ohuroh festival attendanoe, and marriage residenoe. The remaining 
four are based on population, density of population, geographio distanoe from the 
center of the dialeot system, and lexioal distanoe from the oenter. The basio 
formula for predioting oontaot is the one developed in Seotion 6.2^,2: 

. Contaot a (Attraotion x Motivation) /distanoe 

y • ••• 

Attraotion is estimated by the overall attraotion, the population or its density,, or 
the inverse of the distanoe from the banter. Motivation is estimated, by the overall 
motivation, the inverse of population or density, or the distanoe from the oenter. 
Five different measurements of distance are used: absolute and relative geographio 
distanoe, absolute and relative lexioal distanoe, and oomposite relative distanoe 
(six-tenths geographio and four-tenths lexioal). The method in whioh eaoh of these 
variables was soaled to a range of zero to one is given in Appendix 2.3. The 
distanoe measures were aotually inverted and then soaled. In this way, predioted 
contaot beoomes the produot of attraotion, motivation, and Inverted distanoe. Sinoe 
the three oomponeht variables range from zero to one, the resulting predioted value 
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Figure 6.6 Measured contact 



Contact variable Contact Alone 



With similarity Significance 

of improvement 



Geographic distance 


67% 


(52/78) 


78% 


(61/78)- 


.05 


Opinions 


.77% 


(60/78) 


83% 


(65/78) 


.16 


Festival attendance 


79% 


(45/57) 


84% 


(48/57). 


.23 


Marriage ties 


58% 


(45/78) 


74% 


(58/78) 


.01 

* 


Marriage residence 


.62% 


(48/78) 


77% 


(§07-78) I 


.02 



Overall 



68% (250/369) 79% (292/369) .0002 



Contact variable Similarity alone 



With contact Significance 
of improvement 



Geographic distance 


77% 


(60/78) 


78% (61/78) 


.43 


Opinions 


77% 


(60/78) 


\83% (65/78) 


.16 


Festival attendance 


77% 


(60/78) > 


• 84% (48/57) 


.15 


Marriage ties 


77% 


(60/78) 


)4» WW) 


* 


Marriage residence 


77%. 


(60'/78) 


77% .(60/78) 


.50 


Overall 


77% 


(60/78) 


79% (292/369) 


.33 



* Since the second column is lowet r 
hypothesis that the second column 
tested. • . . / 



the significance of the 
is- greater cannot be 



of contact ta guaranteed to range from zero to on&. 

For eaoh of the seven variables, eighteen different seta of contact predictions 
were made. . These eighteen seta 'are organised into two intersecting dimensions 
containing three and six members. The first dimension represents 1 the numerator of 
the oontabt formula. 1 Three different numerators are tried: attraction alone, 
motivation alone, and attraction times motivation,' The second dimension represents 
the denominator of the contact formula and six different denominators are tried. 
The first, is a constant value of orte which has the effect of leaving distande^out of 
the formula. The other five are the five, types of distance already mentioned. The 
results of- the eighteen **sets of predictions for the. seven variables are given -in, 
full in Appendix 2'. 5.' ' ' ■ ^ , ' 

The purpose of tryihg 30 many possible ways to predict contact tiQ test some 
hypotheses about which kinds of predlctiorts are better anil irfhioh are wyrse. The' 
fallowing hypotheses ,are tested i s 

' . J ■ ■ ' I'- 

ll) Relative distance predicts better than absolute distance; 

' \ v - 

(2) Relative distance predicts better than no distance; ' > \ 

(3) Composite distance predicts better thari either measure of "relative 

distance alone; \ - 

(4) Attraction and motivation together predict better than either alone; 1 

. >> ' '. .1 

(5) Predicted' contact predicts better than measured oontaW; 

(6) Contadt predictions based on indirect measurements are equally as 

accurate as those based- on direct contact measurements made in the , 

field. ' • .. \ ' N v 

j.. . . .) 

As is shown in the following paragraphs, hypotheses' ( 1 ) , (2), (5),, and (6) can be 
accepted; (3) y and (M) cannot. . 

' The first hypothesis, that relative distance in the denominator of the, contact 
formula is a better predictor- than absolute distance, is tested in Figures 6.7 and 
6.8.' Figure 6.7 tests the hypothesis for .geographic distance, While Figure 6.8 
tests' it for lexical distance. The '.two figures, are otherwise' completely parallel in 
their layout. The ratios on which- the predictioh accuracies are computed are; the 
pooled Results of the three different numerators for the contact equation: 
attraction alone, motivation alone, and- attraction, . times motivation,' Thus the 

ratios are the sums of the columns in the tables of results -in Appendix 2.5, 

- • ■ ,f ' 

In Figures 6.7 and 6.8, the first column ' of * numbers-, gives the prediction 
aqcur t acy for absolute distance -models while the -second column gives them for 
relative distance models." xThe last* column giyea the- significance levels, fori 
aooepting the hypothesis that the percentages in the, second column « are greater than 
those in the first. .For both geographic and* lexical distance, the percentage of 
accuracy for relative distance' models is , alwaVs greater than for absolute distance 
models. Fdr models based on opinions and. lexical' distance from the center it is 
a;so always significantly so. , For population, density , and marriage residence, 
however, it is not." For festival : attendance* and7 ¥ geographic distance from the 
c q enter, - the increase is nearly significant. ; "for both geographid and lexioaj. 
distance the overall Increase in prediction accuracy (obtained by summing the 
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Figure 6*7 Absolute versus relative geographic distance 



A and M pr#d ictor 

> 


Absolute 
distance 


Relative Significance 
distance of imprbvement 


Op in tans , 


78% 


(182/234) 


89| (208/234) 


.0008 


Festival attendance 

> 


76% 


(130/171) 


82% (141/171) 


.07 


Marriage residence 


75% 


(175/234) 


80% (187/234^ 


e v7 


Population 


78% 


(18^234) 


oil (J.90/234) 


• 21 


Density 


79% 


(186/234) 


81% (190/234) 


.32 


Geographic center 


80% 


(187/234) 


85% (200/234) 


.06 


Lexical center 


81% 


(189/234) 


87% (204/234) 


.03 


.Overall 


78% 

(1232/1575) 


84%' 

(1320/1575) 


.00003 




t 




r 





Figure 6.8 Absolute versus relative lexical distance *' 



A and M predictor 



Absolute 
distance 



Relative Significance 
distance of improvement 



Opinions 


80% (188/234) 


90% 


(211/234) 


.001 


Festival attendance 


78% (134/171) 


»*. 


(144/171) 


.08 


Marriage residence 

* 


78%'<(182/234) 


82% 


(191/234) 


#15 


Population 


78% (182/234) 


81% 


(189/234) 

• 


.21 


Density . 


79% (186/234) 


. 83% 


(194/2 34) 


.17 


Geographic center 

* 


. 80% (188/234) 


67% 


(203/234) 


.03 


Lexical .center v 


82% (193/234) 


88% 


(206/234) 


,05 


Overall 


* 

80% 

(1253/1575) 


85% 

-(1338/1575) 


C ' 

. 00004 



i 
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oolumns) la highly significant. Thua I aooept the hypothesis that relative diatanoe 
in the denominator of the oontaot formula ia a better predictor than absolute 
. distance. ^ 

The aecond hypothesis, that relative distance in the denominator of the oontaot 
formula is a better prediotor than no diatanoe at mil, is tested in Figure 6.9, In 
that figure the tent against relative lexioal diatanoe is shown. A test against 
relative geographic distanoe would have very similar results. The table parallels 
Figures 6.7 and 6.8 in its oonstruotion. The results in the significance oolumn 
ahow that in only two oaaea was the lnorease not' significant, but only nearly 
significant. The overall increase in prediction accuracy is highly significant and 
I therefore aooept the hypothesis. 

V 



Figure 6.9 No distance versus relative lexical distance 



A and M predictor 
Opinions 


No 

distance 
82%' (193/234) 


Relative 
distance of 

90% (2-11/234) 


Significance 
improvemen 

.008 


Festival attendance 


76% (130/171) 


84% 


(144/171) 


.03 


Mar r lag e residence 


70% (164/234) 


8 2% 


(1.91/234) 


,.002 


Popula t ion — 


73% (170/234) 


81% 


(189/234) 


, .02 


Density 

* 


74% (173/234) 


83% 


(194/234) 


' .009 


Geographic center 


82% (193/234) . 


87% 


(203/234) 


. 10 


Lexical center 


83% (194/234) 

♦ 


88% 


(206/234) 


.06 


> 

Overall 


77% « 

(1217/1575) 


89f 

(1338/1575) 


.0000001 



"The fthird hypothesis, that, oomposite relative distanoe in the denominator of 
the oontaot formula is a better- prediotor than either relative geographio or lexioal 
distanoe alone, Is tested in Figure 6.10. The three oolumns of numbers represent 
the three different 'measures s$f distanoe: relative geographio, relative lexioal, 
and oomposite relative. The ratios are based on a' pooling of the results for the 
three different numerators as before. Within the table there are no significant 
differences. • Below the table, the results of signifioanoe tests, on the overall 
trends are given. The overall trend is not even in the direotion of the original 
hypothesis. That is, relative lexioal distanoe turns out to have a higher overall 
prediction aoouraoy than oomposite distance. Therefore, two-tailed tests of 
signifioanoe are made, these; test the hypothesis that the prediction *acouraoies are 
at all different? without specifying the direotion of the differenoe. In "the best 
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0, "' v< would b « wr °n« 38J of the time if we suggested that relative geographic and 
relative lexical distance gave different results f There is no basis for aooepting a 
hypothesis that any one of these three measures " of distance gives significantly 
better results than the other two. 
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The fourth hypothesis, that models with- attraction .times motivation in the 
numerator of the oontaot formula . are , better" predictors than models with just 
attraction or just motivation, is tested in Figure 6.11. The ratios in the body of 
^he table are based on a pooling of three models: with relative geographic distance 
in the denominator, with relative lexical distance, and with relative composite 
aistanoe. These are the best models: the ones with 4 absolute distanoe and no 
distance are not included since they have been shown to be. signif ioantly less 
k \ overall^ tren^, the Mtraotion-tlmes-mot Juration models prove to 

have the highest ratio of prediction aoouraoy. The tests of signif ioanoe show that 
i i J2 M * ovtr . modela of «t tract ion alone is significant, .but only at the .05 
i!?!;; * nore " e ov « r activation alone models is npt significant. Looking 

within the body of the table, it is apparent that there 1* no strong trend. In one 
case the attraction alone model is the best and in two oases the motivation alone 
models' are the best. In only four pairs of models is one signif ioantly less than 
the other. Attraction alone for opinions is signif ioantly less than the other two 
opinion models., and motivation alone for festival attendance is significantly less 
than the other two. The results are th^us largely inconclusive. In is not possible 
to conclude that any particular combination of attraction or motivation or both 
yields better prediction aciouraoy in general. . 

The fifth hypotheais, that predloted oontaot is a better prediotor than 
measured contact, is tested in Figure 6.12. Th.ere are only three oontaot faotors 
for which the aoouraoy of measured and predloted values can be compared. The 
percentages and ratios of prediction acouraoy for measured oontaot are oopied from 
Figure 6.6. The figures for predloted oontaot are taken from Appendix 2.5. For 
each of the three oontaot faotors , the most aoourate model Is chosen to fill in the 
predicted oontaot oolumn. Comparison of the co'lumns shows that In all three oases 
the aoouraoy with predicted oontaot is higher than with measured oontaot. In the 
case of opinions it is significantly higher; in the case of festival attendanoe it 
la not; in the case of marriage residenoe it is nearly so. The overall trend shows 
that predicted oontaot raises the aoouraoy from 8 1% to 891 which is a significant 
increase at the .02 level. 

The sixth hypothesis, that oontaot predictions based on indireot measurements 
Are equally as aoourate as those based on direot measurements in the field, is 
tested in. Figure 6.13. For eaoh of the seven oontaot faotors, the best model from 
the tables in Appendix 2.5 is ohosen to represent it. The seven /aotqrs are grouped 
into three categories and overall a,oouraoy for eaoh category is oowputed. The 
overall aoouraoy forr direot predictions baaed on measured oontaot is 89*, for 
indireot predictions based on popu;ation statiatioa it is 84* , and for indireot 
predictions baaed on distanoe from the* center of the dialeot system it is 88%. 
• 

The percentages for overall direot predictions and overall distanoe from oenter 
predictions are nearly equal. A two-tailed test is used to test the hypotheais that 
they are equal. The result shows that the probability that they are the same is 
98$. The oonoluaion, therefore, is that the . mode la using indireot oontaot 
predictions based on distanoe from the oenter of the dialeot system are equally as 
aoourate as the models which use direct oontaot predictions' based on field 
measurements of, oontaot, The prediotlon" aoouraoy for predictions based on 
population statistios ia less than for the other two categories. Thus' tests were 

. > • i .3 < j . - 
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Figure 6.10 Composite versus single relative distance 




Geographic Lexical Composite 



Opinions 


89% 


(208/234) 


90% 


(211/^347" 


90% 


(210/234) 


Festival attendance 


82% 


(1.41/171) 


84% 


(144/171) 


83% 


(142/171) 


Marriage residence 


80% 


(187/234) 


82% 


(191/234)" 


80% 


(18j8/2 34) 


population 


81% 


(190/234) 


81% 


(189/234) 


82% 


(191/234) 


Density 


81% 


(190/234) 


83% 


(194/234) 


81% 


(190/234) 


Geographical center 


85% 


(200/234) 


87% 


(20,3/234) 


86% 


(201/234) 


Lexical center 


87% 


(204/234) 


88% 


(206/234) 


87% 


(204/234) 


Overall 


84% 




85% 




84% 





(1320/1575) (1338/1575) (1326/1575) 



Significance tests on overall trends (two-tailed tests) : 

* . > • 

Geographic 1 Lexicalj .38 

Composite f Lexical -.56 

Geographic f Composite .78 

ft 

* 



Figure 6 

• 


.11 


Attraction 


and 


motivation 






v. 

Attraction 
alone 


Motivation , 
alone 


Attraction and 
motivation 


Opinions 


86% 


(202/234) 


91% 


(213/234) 


91% 


(214/234) 


Festival attendance 


86% 


(147/171) 


79% 


(135/171) 


85% 


(^5/171) 


Marriage residence 


79% 


(186/234) 


83% 


(-194/234)' 


79% 


(186/234) 


Populat ion 


81% 


(189/234) 


82% 


(191/234> 


83% 


(195/234) 


Density 


81% 

* 


(189/234) 


81% 


(190/234) 


83% 


> 

(195/234) 


vjqv^ i apu 1 v Lull Lu [ 


85% 


,(ll99/234) i 


*87% 


(203/234) 


86% 


(202/234) 


Lexical r*nhnr 


85% 


(199/234) 


88% 

> 


(206/234) 


89% 


(209/234) 


Overall 


83% 




85% 




85% 





(1311/1575) 



(1332/1575) 



(1346/1575) 



Significance tests for overall trend's: 



Motivation > ..Attraction 
A and M > Motivation 
A and M > Attraction 



.15 
.24 
.05 



Figure 6.12 Measured contact versus predicted contact 



Measured 

Opinions 83% (65/78) 

Festivals attendance 84% (48/57) 

7^7% (60/78) 



Marriage residence 

i 

Overall ' 



Predicted Significance 
of improvement 

92% "(72/78) .04 

88% (50/57) ' 

86% (67/78) ' \01 



81% (173/213) 89% (189/213) .02 



Figure 6.X3 Direct predictions versus indirect predictions 

« 

Direct predictions based on measured contact: 

Opinions 92% (72/78) 

■ t 

- * Festival attendance 88% (50/57) 

Mar r iage residence 86% (67/78 ) 

• • * • - - r - . 

Overall *89% (189/213) 

Indirect predictions based on population statistics: 

Population 83% (65/78) 

. Density of population 85% (66/78) 

Overall 84% (131/156) 



indirect predictions based on distance from center of 
dialect system: 

Geographic distance from center 87% (68/78) 
Lexical distance . from center 90% (70/78) 



OVerali 88% (138/156) 



Significance tests: 



Overall direct ^ overall distance from center .98 

. (two-tailed) 

Overall direct > overall population .09 
Opinions > overall population .04 
Overall distance from center > overall population .12 



li2 ■ 
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made to see if they are aignif ioantly leas aocurate. The signifioanoe levels for 
the tests are .09 and .12 so it is not possible to conclude that the indirect 
predictions based on population statistics are signifloantly less aoourate than 
predictions based on measured oontaot or distanoe from the oenter. However, a- test 
of the best single model, the opinions model at 92% aoouraoy, against the overall 
aoouraoy of population statistics shows -that they are aignif ioantly less aoourate 
than the best model. 

Thus far, no attempt has been made to oombine different oontaot faotors in 
predicting intelligibility. Obviously, oontaot has many facets and predictions 
whion simultaneously aooount for many aspeots of oontaot, rather than considering' 
them only one at a time, should be bettec predidtions. A simple way to use step 
functions in making predictions whioh oombine faotors is to seleot an odd number of 
raotors and for eaoh oase to prediot the level of intelligibility indicated by the 
majority of the' faotors considered individually. (This method has the weakness that 
it does not oonsider the possible interaction of faotors.) This is done wiuTthr«ee 
raotors .— oomposite relative distanoe alone, predioted oontaot based on opiniona, 
and predicted oontaot based on lexioal, distanoe from the oenter — for'the Santa 
Cruz data. In most oases, all three predictions agree. Where they do not, the 
7 ,, "timated by two out of the three faotors is taken as' predioted 
intelligibility. The resulting predictions are 95% aoourate: More details on the 

™?? d ^ USe l^ an ? a 0Om P let « matrix of estimated intelligibility based on this 
combined appjroaoh are given in Appendix 2. 1 . .11. v . 7 

6.3.6 Summary of refinements 

A summary of the refinements whioh have been made in Seotions 6.3.4 and 6.3 5 
to models for explaining oommunioation is given in Figure 6.14. In that oiiart the 
percentage of prediction aoouraoy for different types of models are given. The 
arrows show the directions of refinement as new faotors are oombined to improve the 
aoouraoy of th* models. The numbers on the arrows are the signifioinoe level for a 
test of the hypothesis that the model at the head of the arrow is more aoourate than 
the one at the tail. * 

\ The initial models are single variable models. A model whioh explains 
intelligibility as a funotion of lexical similarity alone is 77* aoourate (from 
Figure 6.4). Overall, models which explain intelligiblity as a function of some 
single factor of oontaot are 72% aoourate. This estimate is baaed on a pooling 6f 
results from opinions, festival- attendance, and marriage reaidenoe from Figure 6.4; 
only these threa are' oonaiderC&^n order to maintain comparability at further atepa 
in the development. Theae degrees of aoouraoy are significantly greater than* what 
is possible by ohance alone. 771 adcuracy for lexioal similarity is greater than 
56J for the-^worst oase (44/73) at .003 aignif ioanoe; 12% aoouracy for oontaot is 
greater than 58* for the worrft oase (123/21,3) at .001- signifioanoe. 

Whan lexical similarity and oontaot are oombined in a more oomplex model *o 
explain intelligibility, the degree of aoouraoy inoreases to 81* (Figure 6.6). This 
is greater than the aoouraoy for similarity alone by a confidence level of .20", and 
greater than the aoouraoy for oontaot alone by a oonfidenoe level of .01. The model 
used to prediot intelligibility if, . * 

I « f(L + C(100 - D) 

where! repreaenta the level of intelligibility, 1, repreaents the' peroentage of 
lexioal similarity, and £ represents the degree of oontaot.. The oontaot measures 
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Figure 6.14 Summary of successive refinements to models 

for explaining communication 



Single variable models: 



Lexical simi- 
larity alone 
77% 

(60/78)^ 



Contact 
alone 
72% 

(153/213) 



Complex 
models : 




Similarity and 
measured 'contact 
81% 

(173/213) 

.02 

f' i 

Similarity and 
predicted contact 
89% 

(189/213) ' 

[Measured contact summarized 
as overall attraction and 
motivation within the 
dialect system] 




Absolute 
distance alone 
72% 

(112/156) 



,003 



Relative 
Distance alone 
85% 

(132/^.56) 

[Distance measured rela- 
tive to perspective from 
within the dialect- system] 



.12 



88% . 

(138/156) 

[Contact predicted 
by distance from 
the center of the 
dialect system] 



Combination of 
three models 
95% 

(74/78) 




01 



**4 
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uaed as prediotors in the single variable models were soaled to a range of zero to 
one and then plugged direotly into the predlotion formula. 

Another kind of single variable model tested was a model based on absolute 
geographic or lexioal distanoe between dialeots. Those models were 72* accurate 
overall (Figure 6.5). When the measure of distanoe is refined by making it relative 
to the perspective from within the dialect system, then the aoouraoy increases to 
85* (Figure 6.5). This increase is significant at the .003 level. 

A further refinement to models : f or explaining communication is to prediot 
oontaot rather than to measure it. The resulting models combine the three* single 
variable prediotors already' discussed — lexioal similarity, measured oontaot, and 
distanoe — and are 89* aoourate overall (Figure 6.12). This aoouraoy is greateS 
than the complex models \sing measured oontaot at .02 signifioanoe, and is greeted 
than the aoouraoy for model^yfijKn distanoe alone^ at the*. 12 level. The formula used 
to prediot oontaot is, 

C * AM/D 

where £ represents the hearer's oontaot with the speeoh of the speaker, A represents 
the attraction of the speaker's group, H represents the motivation of the hearer's 
group to have oontaot, and £ represents the distanoe from the hearer's group to the 
speaker's. Relative distanoe is used for the distanoe term. Attraction and 
motivation are estimated by the overall attraction and motivation of dialect groups 
as indicated in the measured contact data. 

Another refinement simplifies the task of data collection with no significant 
loss In aoouraoy of the model. The same formula is used as in the predicted oontaot 
models in the previous paragraph. The difference is that attraction and motivation 
are estimated by the distanoe of the dialeots from the center of the dialect system. 
This kind of estimation does not require the oolleotion of pairwise oontaot 
measurements in the field as the method in the previous paragraph does. The overall 
aoouraoy of these models which prediot contact by distanoe from the oenter Is 88*. 
This is not signif ioantly different from the aoouraoy of the estimates based on 
pairwise field measurements (Figure 6*13). 

A final refinement is to combine different oontaot factors in predicting 
intelligibility. Since oontaot involves many faotors, to oonaider different aspects 
in combination gives better predictions than to consider any one aspect by itself. 
When this is done for the Santa Cruz data, the resulting predictions are 95* 
aoourate. ' 

6.3-7 '"Conolut ions 

6.3.7.1 Explaining communication on Santa Cruz Island 

• Certain' conclusions oonoerning what explains communication on Santa Cruz Island 
can be made from this study. First of all, the lexical similarity between dialects 
is an important faotor. By itself it oan oorreotly aooount for the level of 
intelligibility in 77* of the oases. Secondly, contact between dialeots, in 
combination with similarity, aooounts for half of the remaining inoorreot oases in 
general (88* aoourady, Figure 6.14) and over three-quarters of the remaining 
inoorreot ^ases in the .final combination model (95* aoouraoy, Figure 6:14) . 

The results give some indioation of what aspeots of oontaot are most 

■ : Us 1- ■ . - 
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significant -in explaining- communication on Santa Cruz Island. Marriage ties and 
predioted marriage residence turn out to be the least effective explainers of 
oommunioation. The frequency witfh whioh dialects have oontaot at yearly ohuroh 
festivals proves to be more effective. The significant inorease in aoouraoy of 
relative distance models over absolute distanoe models indicates that a Santa Cruz 
speeoh community's motivation to get out and travel long distances to make contact 
increases as its, distance (both geographio and linguist i<y) away from the other 
speeoh communities increases. The suooeas of the models whioh prediot oontaot on 
the basis of distanoe from the center of the dialect system suggests that San^a Cruz 
Island . is indeed a centered dialect system. The Interpretation of those models is 
that the nearer a dialect is to the center, the more likely it is to attract oontaot 
and the less motivated it is to go out and make oontaot. Conversely, the further a 
dialect is from the center, the more motivated it is likely to be to go out and make 
oontaot and the less likely it is to attract oontaot. The redult^is a general 
directing of oontaot relations in toward the center. 

y 

6.3-7*2 Explaining communication elsewhere 

This study of models for explaining communication on Santa Cruz Island suggests 
at least three oonolusions whioh could have general application: (1) the value of 
local opinions about intelligibility, (2) the systemic nature of dialect relations, 
and (3) the potential of the modeling method. 

The single best predictor of intelligiblity turned out to be local opinions 
about intelligibility. Many investigators have stayed clear of informant opinions 
because they are so open to a subjective element* At first glance the same 
conclusion might be reached for ^S&nta Cruz Island. Opinions alone are only 77% 
accurate at explaining intelligibility (Figure 6.4). I would attribute the errors 
not so much to errors in the informants 1 judgments as to the clumsiness of the 
method I used to measure opinions. I found it possible to elicit responses at only 
three degrees of understanding with any consistency — understand all, some, or 
none. Of course, degree of understanding covers a continuous range. However, it 
was possible to reconstruct predictions of a continuous nature from the original 
opinions. This was possible when the whole island was viewed as a dialect system 
and all of the opinions about a dialeo^t were viewed as saying something about its 
ability to attrabt communication and all of the opinions given by a dialect were 
viewed as saying something about its motivation to communicate. Predictions based 
on these refinements of the original opinions are 92% accurate (Figure 6. 13); this 
1 5% improvement is significant at a .004 level. Refinements such as these may in 
general Inorease the value of local opinions concerning intelligibility. 

.he key to explaining^r^e^iaunloation on Santa Cruz Island is viewing the 
dialects as oomprising^f dialect system. In the summary of the results given in 

Figure 6.14, comments i/n square brackets are placed at three spots in which the 
system viewpoint plays a significant role. These are as follows: (1) The 
measurement of distance as relative to perspective from within the system rather 
than in absolute units (see Section 6.1*3) proved to significantly inorease the 
accuracy of single variable models (Figure 6.5) and complex models (Figures 6.7 and 
6.8). (2) The pairwise measurements of oontaot taken in the field — opinions, 
festival 'attendance, marriage ties ~ were made in terms of three or four point 
discrete scales. The degree of discrimination possible in predictions based 
directly on these measurements is therefore not very great. However, when the 
individual pairwise measurements are viewed as being interrelated within a dialect 
system, all of the\peasurementa concerning individual dialects can be summed to 
compute that dialect's overall attraction and motivation within the * system. 
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Predictions based on these attraotion and motivation estimates are signifioantly 
better than those based on the original measurements (Figure 6.12). Relations based 
on the whole system may not only offer a more disoriminating measure than the 
original data, they may also serve to oompensate for measurement errors in the 
original data. (3) One of the oharaoteristios of a dialeot system Is that general 
patterns of interaction oan be explained in terms of common relations to a oenter 
(Section* 6.1.4). It was found that prediotiona based on distanoe fnWthe dialeot 
oenter were as aoourate as the predictions based on overall attraotion and 
motivation relations found in the original oontaot measurements (Figure 6. 13), A 
model whioh explains communication in terms of relations to a oenter is one order of 
magnitude simpler" than one whioh relies on all of the pairwiae relations in the 
system. That is, if there are jx dialects, the oentered model requires XL raw data 
measurements (the distanoe of eaoh dialeot from the oenter) while the other model 
uses n-squared raw data measurements (to fill in a square matrix of pairwiae oontaot 
measurements) . This increased simplicity is significant not only because it reduoes 
the complexity of data and model description, but it oould also reduce the 
complexity of data collection. 

A final conclusion regards the potential of this modeling appraooh. Three 
models based on a single oontaot factor were 90% aoourate or better: one based on 
composite relative distanoe alone and two complex modela with predicted oontaot 
based on ^opinions or lexical distanoe from the oenter. When three factors were 
combinedt'by taking the level of intelligibility predicted by at least two of the 
three' faotors, 95* accuracy was achieved. When the aoouraoy of intelligibility 
predictions based on linguistic similarity and oontaot begins to exoeed 90% , one 
begins to wonder if the measured intelligibility data themselves are 90% aoourate. 
When predictions are that aoourate they become useful ifidioes by whioh to evaluate 
the measured' intelligibility. Of course, in this study, prediotion aoouraoy was 
measured by comparing the predioted values" to the measured values. Thus we oould 
not have done anything without the intelligibility measurements. However, this is a 
pilot study. As we come to better understand the workinga of dialeot systems 
through further study, it may beoome possible to one da^r predict intelligibility, 
even without first measuring it, with an aoouraoy greater than that with whioh we 
could have measured it. 



APPENDIX 1 
COMPLETE DATA FOR THE 
STUDY OF LEXICAL SIMILARITY AND INTELLIGIBILITY 



U 1 - Sources of data * 

For ••oh set of d*U six It •ma of Information era given: the aouroe of the> 
intelligibility data, a brief note on the method of intelligibility testing, thJ 
type of adjustment used to oontrol for intelligibility measurement error (Seotion 
5.2.4), the aouroe of the oognate peroentagea, the type of word list uaed, and the 
oorreapondenoe of three letter mnemonlo oodea to village or dialeot names. The ten 
atudiea are oonaidered in alphabetical order of - the name by whioh they' are 
referenced . 

(1) AUi&U - Biliau la spoken in the Madang Provinoe of Papua New Guinea. The 
dialeot survey was oonduoted by myself and my wife, Linda, in 1976. The 
intelligibility testing followed the method of Cased with two exoeptiona: the 
questions were asked in the trade language and tests were administered to groups as 
well as to individuals. The raw intelligibility soores are adjusted by the hometown 
method. The word list used was the Swadesh 100-word list. The oorrespondenoe of 
mnemonic oodes to village names is: 

BIL « Biliau 
YAM « Yamal . 

SUI « Suit f 

Unfortunately, the data here represent only the results of a pilot study; alokneas 
prevented the completion of the full survey. Neither the data nor the results have 
been publiahed elsewhere. 

(2) iuaflg «■ Buang is spoken in the Morobe Provinoe of Papua New Guinea. The 
survey of Buang. dlaleots was oonduoted by Gillian Sankoff between 1966 and 1968. 
The intelligibility and lexiooatatistio data are taken from Sankoff 1969. The 
approaoh to intelligibility teating la similar to that of Cased though not aa exaot. 
Subjeots listened to a test tape and then anawered three questions about it in order 
to Judge comprehension of events in the story (see Seotion 2.2.4 for a fuller 
description). A proportional adjustment for subjeots is used to adjust Taw 
intelligibility soores. The test list used for Uxioorftatid comparison was a 162 
item list comprising the Swadesh 100-word liet plus a. number of pultural itema 
specifio to New Guinea. The oorreapondenoe of mnemonio oodes jfo village names is: 

BUW * Buweyew 
MNB .«• Mambump 
WIN m Wins 

CHI « Chimbuluk ' 
PAP > Papekene 

MNG « Jiangga ■ ' * 
KVA ■ Kwaaang 
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Although the Intelligibility tests were administered in seven different villages, 
only three dialects were used in test tapes ~- MMB\ CHI, and MNQ . Thus for four of 
the villages there are no proper hometown soores. For these four the hometown soore 
for one of the other three village* is used as an estimate. The one used is the one 
in the same dialeot group, aooording tq Sankoff's grouping into three dialects. 
Thus the hometown soore for MMB serves also for BUW, the hometown soore for CHI 
serves also for WIN and PAP f and the hometown soore for MNO serves also for KWA . 

(3) Ethiopia - The data from Ethiopia oome from the intelligibility survey of:' 
the Sidamo languages conducted by Marvin Bender and Robert Cooper (1971). The test 
method consisted of playing the test text, then having the subjects (w^o were sohool 
children) answer multiple choice questions with four possible responses about the 
contents of the story. These tests were written, and were conducted in the national 
language. / The method is described in more detail in Section 2.2.M. 

Sinok the tests consisted of choosing the correct one out of four possible 
ahswers, /it is ^possible that a group of subjects w^Lth no knowledge of j» language 

oould score 25H correct simply by chance. Therefore, when subjects soorecl less than 
25%, it can be assumed that there was no comprehension. Thus the raw soores must 
first be adjusted to remove the ohanoe element.. This is done by recomputing the 
soores as the percentage of oorreot responses above the^ chance level* The soore for 
oorr*eot responses above the ohanoe level is given by subtracting 25J f rom r the raw 
soore or by 0% f whichever is greater . The total possible above the ohanoe level is 
given by subtracting 25% from lOOf, or 75*. The percentage of intelligibility 
adjusted for ohanoe is obtained by dividing the oorreot by the total possible and 
multiplying by 100. That is, 

adjusted for ohanoe * max(raw 25, 0) / 75 x 100 

The "raw" soores reported in Appendix 1.2 have already been adjusted in this way. 
Bender and Cooper made no such adjustment; the technique was suggested by Ladefoged, 
Gliok, and Criper (1972:68). The adjusted soores for this set of data are further 
adjustments on these raw scores. For this, the hometown adjustment was used. 

The cognate percentages to accompany the iatelligibility soores ,a^e found in 
another source, Bender (1971). The word list used was the Swadeah 100-word \ist 
with .modifications dictated by experience in the Ethiopian field.* . Correspondence of 
mnemonic codes to dialect names is: 

' > 

ALA * Alaba 
KEM » Kembata 
HAD i Hadiyya 

SID » Sidamo . ^ 
DER « Derasa \ 
BUR * Burji 

Iroquolfi « The intelligibility, survey among the Iroquol? languages, 
northeastern United States, was oonduoted by Hiokerson, Tyrner, and; Hiokerson 
( 1952), Their mtothod was basically one of text translation and the study has been 
described in more Retail in Chapter 2, Section' 2.2. 1 . A proportional adjustment for 
subjects was used t$ compute the adjusted intelligibility soores 1 . The cognate 

percentages are taken^ from Floyd Lounsbury ( 196 1 ) . The word list used for 
comparison was the SWadeSh 200-word list. Hiokerson, Turner; and Hiokerson tested 
intelligibility among six dialects, resulting in 36 pairwlse measurements. The 
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lexiooatatiatio comparisons by Lounsbury involve only four of those aix dialeota. 
Within the aet of lexiooatatiatio qompariaons, the peroentage for ona pair of 
languagaa (Tuaoarora with Cayuga) la miaaing. Aa'.a reault, oorreaponding laxioal 
data were found for only fourteen of the thirty-aix intelligibility meaaurementa. 
Only theae fourteen data pointa are inoluded in the aampla. Tha oorraapondenoa of 
mnamonio namea to dialaot oodaa is: 

SEN a Sanaa* 
.... CAY ■ Gayuga . 
. . • ONE ■ Onaida 

TUS a Tuaoarora 

(5) MflgfltOfl - Tha intalllgibility aurvey among tha Mazateo dialects of Maxioo 
waa oarried out by Paul Kirk; Tha r^aulta of tha aurvay wara firat published by 
Kirk (1970) and than reproduced, by Caaad (1974:34-35, 47-49). The method uaad in 
the taating waa the Caaad method (Saotion 2.2^3). The ..raw intelligibility aoorea 
are adjuated by the hometown method. The lexiooatatiatio oompfMaon of the Mazateo 
dialeota waa done by' Sarah Gu^daohinaky (1955). The Swadeah 200-word liat waa uaad 
for the comparison. The oorreapondenoe of mnemonio codes to village namea ia: 

HUA a Huautla de Jimenez ..-•*' 

MAT a San Mateo 

MIG a San Miguel - 

IXC a Ixoatlan 

SOY a'Soyaltepeo 

JAL a Jalapa de Diaz 



(6) Polvnaaia - Intelligibility among the Polyneaian languagaa and dialeota waa 
teated by Jack Ward (1962). The method of testing uaad waa a aentenoe translation 
teat. A oonatant adjuatment for aubjeota waa used to adjust- raw intelligibility 
aoorea. The lexiooatatiatio comparisons were 'performed by Samuel Elbert (1953). 
The oompariaona are baaed on Swadeah's early basio vooabulary of 165 worda- whioh 

Elbert expanded to 202 worda.. The oorreapondenoe of mnemonio oodes to language and 
dialaot namea ia: ' 

EAS a ■ Eaater Island 
HAW a Hawaiian 
KAP a Kaplngamarangf 
MAN a Mangareva 

MAO a New Zealand Maori * 
MAR a Marqueaaa- 
RAR a Rarotonga, 
SAM a Samoa 

TAH a Tahiti . s *" 

TON a Tbnga 1 
TUA a Tuamotu 
UVE a Uvea* 



, W J.filfillftft. - The intelligibility aurvey among the Siouan languages of the Great 
Plains area of the United States and Canada waa o&nduoted by Warren Harbeok and 
Raymond . Gordon in 1968. The results' of the aurvey are reported in an unpublished 
paper ,( Harbeok ms^[ 1969]) . JThe method- uaad for the testing was one of text 
trana^ation. The testa weiji (tooored In two ways: the first computed the acouraoy of 
an item by item translation and the aeoond meaaur.ed ^ general oomprehenaion by 
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oheoklng for the presence of ten key places of information in the tranalation. The 
raw intelligibility soorea uaed in thia atudy are the average of the. reaulta* of the 
two different aooring prooedurea. The raw intelligibility aoorea are adjuated by 
the hometown method. The cognate peroentagea are from the same adurce and are baaed 
on the Swadeah 100-word liet. The* oorreapondenoe of mnemonic oodea to dialect namea 
ia: * 

STO * Stoney 

ASS * Aaainiboine 

MAN « Manitoba variety of Dakota-Nakota 
NDK « North Dakota variety of Dakota-Nakota 
LAK * Lakota 



(8) TrlflUfl - The intelligibility aurvey of the Trique language area of Mexioo 
waa oarried out by Eugene Caaad in 1970. <The reaulta of the aurvey are reported in 
his manual on dial/sot intelligibility testing ( 1974:78-81 , 191-192) . The method of 
testing waa the queation approaoh deaoribed in detail in .the manual (Seotion 2:2.3). 
The raw intelligibility aoores are adjuated by the hometown method. The. 
lexicoatatiatio oompariaon waa based on' the Swadeah 100-word liat.^ The 
oorrespondepce of mnemonio oodea to village namea ia: 

MIG « San Miguel 

ITU « ItUnyoao 

LAG « Laguna 

V .> , ^ CHI a CHloahuaxtla > 

? SAB * Sabana 



(9) ilgandfi. - Theae data are the reaulta of intelligibility teata oonduoted with 
apeakera of two Bantu languages in Uganda by Peter Ladefoged (19,68). The 
intelligibility teat reaulta were extraoted from page -67 of Ladefoged', Gliok, and 
Criper (,1972), and- the oognate peroentagea are extraoted -from page 71. The teata 
were adminiatered to literate aohool ohildren. A short story from another language 
waa played and' the listeners were aaked what it waa about. They were preaented with 
three " poaaible anaWers whioh were also written in a teat booklet and were asked to 
write down the number of the appropriate reaponse. The raw aoorea reported in 
Appendix. 1.2 are, adjuated to account for the element ofohanoe as already deaoribed 
in the deaoription of the Ethiopian data, the only difference being that here the 
chance level is computed at 33'3*. the adjusted soores reported in Appendix 1.2 
have undergone a further proportional adjustment for aubjeota. The word Hat uaed 
for lexicoatatiatio oompariaon waa a liat designed especially for. the Ugandan 
aurvey.. In aetting up the new list the guiding prinoiple waa not to uae baaio 
vooabulary which ia auppoaedly more resistant to ohange^ but to uae meanings whioh 
elioited reliable anawera and whioh were valid indicators of the oommunioative 
possibilities of the language, as a Whole (Ladefoged and other? 1972:54-55). The 
oorreapondenoe of mnemonic codea to language names is: 

LUG * Luganda ^ 
„ RUN' a Runyankore 

RUT x Rutoord 
' RUK a Rukiga 
LUM a Lumasaba 
LUS a Lusoga <•.' • •.. 
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'(*0)| lUMSk - The Intelligibility survey among the Yuman langauges of the 
southwest/em United States was oonduoted by Bruoe Biggs (1957). The method was 
basioaliy a text translation approaoh and is desoribed in detail in Chapter 2, 
Seotion / 2.2.1. %h* oognate percentages whioh oorrespond to the intelligibility 
percentages are reported by Bigga. TMel||re taken from Werner Winter (1957). A 
100-word, list was uaed. Tha word liSt items "jwere ohoaen at random, though with 
considerable emphasis on words from Swadeato lists" (Winter 1957:19). The 
correspondence of mnemohio oodes tjo dialect names is: ; 

MAR a Maricbpa 

WAL > Walapai 

YAV « Yavapai * 

MOH • Mohave , 

I , HAV • Havaaupai 



1.2 Complete listing of da^la , , 

, The following pages are a oomplete listing of the data, presented study by 
study. The data are presented in eight oolumns. They are, in order: . (1) "HEAR* , 
the hearers, the mnemonio oode of the village or dialect taking the intelligibility 
teat; (2) "SPKR", the speaker, the mnemonio code of the village or dialeot whioh ia 
« peaking oh the test tape; (3) "LEX" , the percentage of lexioal -cognates; 
U) HINT RAW",, the raw percentage of . intelligibility; (5) "INT ADJ", the adjuated 
percentage of intelligibility (-for each set, the method of adjuating ia deaoribed in 
Appendix 1.1); (6) "EXCLUDE" , ah "X" is listed if this oase is excluded due to 
;nonsymmetrio intelligibility attributed to social faotora; (7) "SUBJ", the hometown 
score for the group of subjeots (used In adjusting raw intelligibility); (8). "TEST", 
the hometown soore for the test whioty is being administered (used in adjusting raw . 
Intelligibility) . . - • • 
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ALA 


KEM 


81 . 
54 


95 


94.7 


X 


91 


99 


ALA 


HAD 


61 , 


61.3 


n 


91 


89 


ALA 


SfD 


64 


28 


28.0 




91 


95 


ALA 


DBR 


49 


.16 


16.0 




91 




ALA 


BUR 


40 


13 


13.3 




91 


,> 91 


KEM 


ALA 


81 


79 


« 78'. 7 


- 


99 


91 


KEM ■ 


KEM 


100 


99 


100.0 


* 


99 


99 


KEM 


HAD * 


56 


49 


49.3 


• 


99 


89 


KEM 


• SID 


62 


23 


22.7 


r 


99 


95 


KEM 


PER . 


49 


9 


9.3 




99 


81 


KEM 


BUR 


39 


24 


? 24.0 




99 


91 


HAD 


ALA 


54 


67 


66.7 




89 


» 9 i 


HAD 


KEM 


56 


65 


65.3 


X 


* 89 


9,9 


HAD 


HAD 


100 


89 


100.0- 

j 




89 


89 


HAD 


^ SID 


53 


2 V 


. i. I> . 5 






Q c; 


HAD 


: DER" 


42 


33 


33.3 




89 


81 


HAD 


BUR 


38 


20. 


20.0 




89 


91 


SID 


ALA 


64 


40 


40.0 


X 


95 


91 


SID 


KEM 


62 


16 


x 16.0 




95 


99 


SID 


HAD 


53 


25 


25. 3 




95 


89 


SID 


SID 


100 


95 


100.0 




95 


95 


SID 


DER 


60 


13 


* 13.3 




95 


81 
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(3) 4 ,Ethiopia , cpntinu* 




HEAR « SPKR LEX INT RAW ' INT ADJ EXCLUDE SUBJ TEST 





SID 


BUR 


* 41 


29 


. 29.3 




95 


,91 


- t 


DER 


ALA 


49 




32.0 


X 




*91 




DER 


KEM 


49 


24 


24.0 


X 


81 


99 




. DER 


HAD 


42° 


21 


21.3. 




81 


89 




DER 


SID 


60 


41 


41.3 




81 


95 


A 


DER 


DER 


100 r . 


81 


100.0 




81 


81 




DER- 


BUR 


37 


15 


14.7 




81 


91 



\ 



(4) Iroquois 



4 ft 



HEAR SPKR LEX INT RAW INT ADJ EXCLUDE SU&J TEST 



r SEN 


SEN 


100 


83 


100.0 

ft 


I 83 


. '..83 


SEN 


CAY 


72 


82 


98.8 


X 8 3 


: 80 


SEN 


on£ 


65* 


30 . 


; 36.i 


83 


46 


SEN 


TUS 


50 v 


*■ 0 


0.0 


83 


8'? 


CAY 


SEN 


72 


54 


67.5 


80 


83 


CAY 


CAY 


. 100 


80 


100.0 


■ 80 


' 80 


s> m mm 

CAY 


' ONE 


73 


7 


8 . 7 


80 , 


46 


ONE 


SEN 


"65 


17 


37.0 * • 


46. 


83 


ONE 


CAY 


73 ' 


18 


39.1 


X -,46 


„ 80 


ONE 


C)NE 


100 


46 


100.0 


46 


46 


ONE 


TUS 


59 


.0 


0.0 


' * 46 


.83 


TUS 


SEN 


50 


0 


.0.0 


83 


83 


TUS 


ONE 


. 59 


0 


0.0 


83 


46 


TUS 


TUS 


100 


83 


100.0 


83 


83 
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(5) 


Ma'zatec 








HEAR 


SPKR 


LEX 


INT RAW 


IN,T ADJ 


EXCLUDE 


SUBJ 

p 

t 


TES1 


- 

HUA 


HUA 


- 

100 


92 


100.0 




92 


92 


HUA 


JAL 


74 


35 


35.0 




92 

r 


95 


MAT 


HUA 


94 


90 


90.0 




9J3 


92 


MAT 


MAT 


100 


93 


100.0 




93 

> 


93 


MAT 


JAL 


82 


3.3 


33.0 


r 


93/ 


95 


MIG 


HUA 


94 


93 


93.0 




JL00 


92 


MIG 


MIG 


100 


iod 


100.0 


r 


roo 


'100 


M-IG 


JAL 


82 


56 


* 56.0 


•iT 


lpo 


95 


IXC 


HUA 


78 


76 


76.0 




89 


.92, 


IXC 


MIG 


85 


77 


77.0 - 




89 


100 


IXC 


IXC 


100 


8,9 


POO.O 




89 


89 


IXC 


SOY 


85 


* 70 


70.0 




n 


98 


IXC 


JAL 


82 


64 


64.0 




89 


95 


SQY 


HUA 


80 - 


73 


73.0 




98 


92 


SOY 


SOY 


100 


98 


100.0 


* l 


98 


, 98 


SOY 


JAl 


8.0 


43 


43.0 




98 


95, 


* JAL 


HUA 


74 


73 


73.0 


X 


~ 95 


92 


*JAL 


SOY 


80 


51 


51.0 


x 


95 


98 


JAL 


JAL 


100 


95 


100.0 




95 


95 



\ 



O -ft 

ERJC\ 
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♦ 






(6) 

\ 


Polynesia 

i 

j' 








TEST 


REF 


LEX 


INT RAW INT ADJ 

** 


EXCLUDE 


SUBJ 


TEST 


HAW 


EAS 


64 


28 


34.0 




94 


96 


HAW 


HAW 


100 


94 


100.0 




94 


94 


HAW 


KAP 


49 


->45 


21.0 




94 


96 


HAW 


MAN 


69 


33 


39.0 




94 


■ 98 


HAW 


MAO 


71 


25 


31.0 




94 


96 


HAW 


MAR 


70 


32 


38/0 




94 


93 


HAW 


RAR 


79 


25 


31.0 




94 


• 93 


HAW 


SAM 


59 


25 


31.0 




94 


97 


HAW 


TAH 


76 


39. 


4 5.0 




94, 


95 


HAW 


TON 


49 


3 


i 

9.0 




94 


98 


HAW 


TUA 


77 


39 


4"5.0 




94 


97 


HAW 


UVE 


55 


9 


15.0 


« 


94 


'9.6° 


MAN 


EAS 


64 


-48 


50.0 

r 


i 


98 




MAN 


HAW . 


69 


41 


~4'3.0* 




98 


94 


MAN 


KAP 


49 


.26 


28.0 




... 98 

t 


96 


MAN 


MAN 


100 


98 


100.0 




98 


98 


MAN 


MAR 


73 


58 


60.0 




98 


93 


MAN 


RAR 


75 


74 


76/0 




98 


93 


MAN 

\ 

MAN 


SAM 


55 


24 


26.0 


• 


98 


97 


TON 


49 


6 


.— 8.0 




98 


98 


MAN 


TUA ' 


72 


96 


98.0 


X 


98 1 




MAR 


EAS 


63' 


43 


50.0 




93 


96 


MAR 


HAW 


70 


50 




X 


93 


94 
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(6) Polynesia, continued 



HEAR SPKR LEX INT RAW INT ADJ EXCLUDE SUBJ TEST 



MAR 


KAP 


45 


59 


66 0 




7 3 


70 


MAR 


MAN 


73 


59 


66.0 




Q 1 


Oft 

70 


MAR 


. MAR 


100 


93 


100 . 0 

* w W 1 V 






01 

7 J 

r 


MAR 


RAR 


73 


58 


65.0 


V 

'A 


01 


0 1 

7 J 


MAR 


SAM 


52 


30 


37.0 


X 


0 3 


07 

7 / 


MAR 


TON 

/• 

TIIA 
1 UA 

/ 


45 


8 


li. 0 






Oft 

7 O 


MAR 




Q "7 


1 fx r\ r\ 

100 • 0 


X 


93 


97 


RAR 


EAS 


64 


34' 


41.0 




01 


06 


RAR 


HAW 


7 C 9 


30 


37 . 0 




01 

* 7 w 


04 

7 ^1 


RAR 


KAP 


54 


19 


2<#0 




01 

7 -J 


06 

7 O 


RAR 


■ MAN 


' 75 


57 


64.0 


x 


oi 

7 J 


Oft 

70 


RAR 


MAR 


73 


32 


39.0 




01 

7 V 


01 

7 J 


RAR 


RAR 


100 


93 


100.0 




01 

7 J 


01 
7 J 


RAR 

it 


SAM 


67 


19 


26.0 

4* v ♦ v.. ... 




01 


07" 

7 f 


RAR 


TON 


56 


4 


11. 0 

»«' ■ « 




01 

7 -J 


0ft 

7 O 


RAR 


TUA 


83 


90 


97.0c 


X 


93 


97 


SAM 


EAS 


53 


13 


16.0 




97 


96 


SAM 


HAW 


59 


26 


29.0 




97 


94 


SAM 


KAP 


53 


15 


18.0 




97 


96 


SAM 


MAN 


55 ' 


29 


32.0 




97 ■ 


98 


SAM 


MAO 


57 


28 


31.0 


* 


97 


96 


SAM 


MAR 


52 ( 


16 '" : 


19.0 




97 


£3 


SAM 

1 


RAR 


67. 


17 


20.0 




97 


93 



4 
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(6) Polynesia, continued 




HEAR 


SPKR 


LEX 

0. 


INT RAW 


INT ADJ 

■ »■ 


EXCLUDE 


SUBJ 


SAM 


SAM 


100 


97 


100.0 




97 


SAM 


TON 


66 


16 


' 19.0 - 




97 


SAM 


TUA 


62 


\ 18 


21.0 < 




97 


SAM 


UVE 


70 


33 


36.0 




... 97 


TAH 


EAS 


62 


30 


35.0 




95 


TAH 


HAW 


76 


36 


41,0 


•- 


95 


TAH 


KAP - 


50 


12 


17.0 ^ 




95 


TAH 


MAN 


68 


39 


44.0 




95 


TAH 


MAR 


67 


42 


47.0 




.95 


TAH 


RAR' 


85 


64 


69.0 




95 


TAH 


SAM ' 


60 


22 


27.0 




95 


. TAH 


TAH 1 


100 


95 


100.0 




95 


TON 


EAS 


48 


15 


17.0 




98 * 


TON • 


HAW 


49 


10/ 

t 


12.0 




98 


TON 


KAP 


45 


12 


14.0 




98 


TON 


MAN 


49 


24 


26.0 


X 


98 


TON ' 


MAR 


45 


10 


12.0 




98 


TON 


RAR 


58 


24 


26.0 


X 


98 


TON 


SAM . 


66 


32 


34.0 


X 


98 1 


TON 


TON 


100 


98 


100.0 


* 


98 


TON 


TUA 




8 ' 


10.0 




98 


•TON 


UVE 


86 


. 73 


75.0 . 




98 


- TUA 


EAS 


62 


30 ■ 


33.0 




97 



TEST 

97 

98 

97 . 

96 

96, 

94 

96 - 

98 

93 

93 

97 

95 

96 

94 

96 

98 

93 

93 

97 

"98 

97 

96 

96 



I! 



l6[ 
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(6) Polynesia, continued 



HEAR 


SPKR 


LEX 


INT RAW 


^INT^wfo 


EX<?LH|DE 


SUBJ 


TEST 


TUA 


HAW 


77 


'34 


37 ,D 


91 


94 


TUA 


KAP 


51 


13 ' 


16v0 

> 




9/ 


96 


TUA 


NAN 


72 


56 


59.0 




97 


> 98 












■» 






TUA 


MAR 


69 


56 


59.0 




97 


93 


TUA * 


RAR 


83 


74 


77.0 




97 


93 


TUA 


SAM 




2 1 

«v 






Q"7 


Q "7 


TUA 


TON 


53 




N . 9.0 




97 


98 


TUA 


TUA 


100 


97 


1\00.0 




97 


, 97 


■% 








3 

r 


1 
















* 




» 








f . 











•A 



ERIC 
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(7) Siouan 



HEAR*? SPKR LEX IN^T RAW INT.ADJ - EXCLUDE SUBJ TEST 



STO 
STO 
STO 
STO 
STO 
ASS 
ASS 
ASS, 
ASS 
ASS 
MAN 
MAN 
MAN 

q 

MAN 
MAN 
NDK 
NDK 
NDK 
NDK 
NDK 
LAK 
LAK 



STO 100 
ASS .89 



MAN 

NDK 
LAK 
STO 



NDK 

LAK 

STO 

ASS 

MAN 

NDK 

LAK 

STO 

ASS* 

MAN 

NDK 

LAK 

STO 

ASS 



) 



0 



86 

85 
83 
89 



ASS JU>0 
MAN t 94 



90 
89 
86 
94 

100 
95 
91 
85 
90 
95 

100 
90 
83 
89 




163 



/ 



V 





i - 




159 






/ 


i ' 

% 

• 


r 


(7) Siouan, cdntinued 

r 

t 






HEAR 


SPKR 


LEX 


INT RAW INT AE\) EXCLUDE 


SUBJ' 


test' 


LAK 


MAN 


91 


79" " 79.0 


96 




LAK 


NDK 


9D 


90 90.0 * 


9« 


89 


LAK 


LA* 


100 


9/6 - 100.0 


96 


96 



) 



* 



I- 



ERIC 
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4» 

(8)» Trlque 



HEAR 


SPKR 


LEfX 


INT RAW 

• 


INT A IX 


MIG 


SAB 


100 


99 


100«0 


MIG 


▼ fffi f f 

ITU 


84 


56 


56 • 0 


MIG 


LAG 


78 


58 ' 


CO A 

58 . 0 


ITU 


SAB 


84 


92 


92*0 


▼ mil 

ITU 


ITU 


100 


A A 

99 


inn n 
100*0 


ITU 


LAG 


87 


98 


9 8 • 0 


LAG 


SAB 


78 


83 


83.0 


LAG 


ITU 


87 


" 86 


86.0 

<> 


LAG 


LAG 


100 


98 


100 , 0 


CHI 


SAB 


78 


74. 


74.0 


CHI 


ITU 


87 


83 


83.0 


CHI 


LAG 


100 


97 


100.0 


SAB 


SAB 


100 


98 


ioo.o 












SAB 


ITU 


84 


64 


64.0 


SAB 


LAG 


78 


57 ' 


57.0 



•X 
X 



X 






A* 


QQ 

yy 

* 


QQ 


QQ 

y y 


QQ 

y y 


99 


98 


y y 


98 


QQ ^ 
y ^ 


99 


QQ 

7 y 


98 


7 0 


Qfl 
y o 


QQ 


QQ 
y y 


98 


98 


97 


98 


97 


99 


97 


98 


98 


9 a/ 


98". 


99, 


98 


98 
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X 




Ugapda 



HEAR 


SPKR 


LEX 


INT RAW 


TNT AD.7 EXCtUDP 


LUG 


LUG 


100 


79 


100.0 


LUG 


RUT 


64 


24 


29.8 


LUG 


RUN 


63 


31 


39.2 


LUG 


LUS 


86 


49 


62.0 


LUG 


LUM 


54 


19 


• 24.1 


RUN 


LUG 


63 


61 


74.4 ^X 


RUN 


RUT 


86 


67 


81.7 


RUN 


RUN 


100 


82 


100.0 


RUN 


RUK 


94 


72 


87.2 


RUN 


LUM 


49 


0 


0.0 . 



79 
79 
79 
7? 
79 
82 
82 
82 
82 
82 



79 
81 
82 
81 
81 
79 
81 
82 
81 
81 
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(10) 
4 

HEAR SPKR LEX INT RAW 



MAR 


MAR 


100 


96 


MAR 


WAL 


57 


18 


MAR 


YAV 


57 


13 


MAR , 


MOH 


85, 


67 


MAR 


HAV 


58 


•10 


WAL 


MAR 


57 


14 


WAL 


WAL 


100 


96 


WAL 


YAV 


91 


96 


WAL 


MOH 


63 


27 


WAL 


HAV 


95 


91 


YAV 


MAR 


57 


12 


YAV 


WAL 


91 


•82 


YAV 


YAV 


100 


94 


YAV 


MOH 


62 


20 


YAV 


HAV 


92 


• 

78 


MOH 


MAR 


85 


77 


MOH 


WAL 


63 


.11 


MOH 


YAV 


62 ' 


13 


MOH - 


MOH 


• 100 


84 


MOH 


HAV 


63 


16 


HAV 


MAR 


58 


ii 


HAV 


WAL 


9,5 


« 98 



) 



Yuman .' 



INT ADJ EXCLUDE SUBJ TEST 



100.0 




96 


96 


22.0 




96 


96 


17. .0 




96 


94 


. 71.0 




96 


84 


14.0 




96 


,?1 


18.0- 




96 


96 . 


100.0 




96 


96 . 


100.0 


X 


96 


94 


31.0 


X 


96 


84 


95. 0 


• 


96 


91 


18.0 




94' 


96 


88.0 




94 


96 


100.0 




94 


94 


26.0 




94 


84 


84.0 




94 


91 


93.0 . 


X 

V 


84 


.96 


.27.0 




84 


. 96' 


29.0 




84 


94 


100,0 


V 


' 8 *4 - , 


*M 


32. 0 




•84 ' 


* 91 


20.0 




9.1 


96 


100.0 


X 


91 


96 


\ 

9 


r\ 
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i 

(10) Yuman, continued _^ x 



HEAR 


SPKR 


LEX 


INT RAW 


INT ADJ 


EXCLUDE SUB J 


TEST 


HAV 


YAV 


92 


83 


92.0 


91 

i 


94 


HAV 


MOH 


63 


18 


• 27.0 


9-1 


. 84 


H*V. 


HAV 


aoo 


91 


100.0 


91 


91 



. ft ' » 
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16M 



1.3 Scattergrams for raw data 



(1) Biliau 



xrrrvu.xaxtxt.XTv 



I I I I 



•x 1 



o - 



LCMXCftL SIMILARITY 



All points ( 



) : Irtt ■ . 284 Lex + 66.3 



Excluding x*s (- - -): Int - /519 Lex + 41.5 



All points 
Excl x's 



N %EV Corr Sig SEE Lex-100 Int-0 
9 18.1 .42487 .2543 6.1 • 94.7 -233.5 
6 74.2 .86156 .0274 > 3. 2 - 93.4 -79.9 



•ERJC 



4> i 



169 



165 
(2) Buang 




UCXXCftL f tMXUNUTV 




% 



N %EV Corr Sig SEE L«x-lt)0 lnt-0 
All points 21 49.3 .70232 .0004 11.8 s $8.8. 15.3 
, u Excl x»s 15 65.8 .81090, .0002 10.7 68.0 . 31.1 
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(3). Ethiopia 



INTtLUIOXtfUXTV 



/ 




LEXICAL SIfltLAKITV 



All points ( — ): Int.- 1-. 217 Lex.-' 30,5 

Excluding x's ( ): Int * 1.20B Lex -» 32.4 



N %EV Corr 
All points 30 71.6 .84592 
Excl x's 23^ 75.1 .86677 



Sig - SEE Lex-iQ*) Int-0 
0001 16.2« 91. 2 25.1 
0001 16.1 88.5 26.8 



ERIC 



XHTKLLXOtSIlXTV 
& 



•ft 



167 



(4) Iroquois 



,l i i i i i i i I i i_ 



O 



•ft f ** 




i i — -r 



LCXXCftL •irilUMttTV 



All points ( ): Inf • 1.519 Lex - 76.9 

Excluding x's (- ): Int - 1.540 Lsx - 81.3 

N %EV Corr Sig SEE Lex-100 Int-0 
All points 14 66.0 .81267 .0004 21.0 75..0 -50.6 * 

Excl x's 12 80\9 .89944 .Q001 15.8 72.7 52.8 



ERJC 



172 



168 

(5) Mazatec 



IHTCLHOItlLITV 

l, 1 ' 1 J 1 



I I II I I 




LCXXCAL SiniLAHITV 



All points ( : — ): Int - 1.766 Lex - 81.5 

Excluding x's (- ): Int - 1.957 Lex - 99.4 



N %EV 

# 

All points 19 < 65. 1 

r 

Bxcl x's 17 71.7 



Corr " 



Sig 



80659 ^..0001 
84672 .0001 



SEE Lex-100 Int-0 
13.1 ) t 95.1 46 .JL 

12,. 1 96.3 50.8 



17.3 
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(6) Polynesia 



iNTCLLIOtfttUTV 




LEXICAL fiWLAKITV 



All points ( — ): fnb 1. 588 Lex - 67.2 

Excluding x's ( ->: Int » 1. 563 Lex - 68.0 

■V) 

N IEV Corr Sig SEE Lex-100 Int-0 

All points 77 74.6 .86350 .0001 14.4 91.-6 42.3 

Excl x's 67 83.0 .91091 .0001 11.5 88.3 43.5 



.» y 



irrrtLLiQUXLXTv 
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(7) Siouan 




-HS h 



LEXICAL $H1ILA*ITV 



All points ( 



): -Int - 4. 385 Lex 336.0 

Excluding x's - -)c Int - 4.560 Lex - 355.4 



Nf %EV Corr 
All points 25 64.9 .80543 
Excl x's 20 74.2 .86156 



Sig SEE Lex-100 
.000^ 18.1 .102.5 
,0001 15.9 100. £ 



I.nt-0 
76.6 
V 77.9 



17" 



• I- 



171 - ~ 



(8 ) Trique 



INTCU.I6ZftXt.tTv 



••li- 



ft** 




Pi\l points ( 



lexical timuwrv 



\ 



): Int - lWGSXex - 41.3 



Excluding x's (-■--): Int - 1.894- Lex - 90.5 

|| %EV Corr Sig SEE Lex-100 Int-0 

All points 15' 58.5 .76503 .0009 11.2 99.2 . 29.4 

Excl x's* - 11 88/7 .94174 .0001 '6.7 '98.9 47.8. 



1± . 



76 



XNTVLLIOIftXltTv 



t" 



172 

r 



(9)'ggandA 

* ■ \ - 




LEXICAL SIMILARITY 



Al 1 po i rrt s ( 



) : Int « 1. 325 Lex - 52. 



Excluding x's (- -*-): Int - 1.459 Lex - 



N %EV Corr Stg SEE Lex-100 
All points 10 ; 81.8 .90457 .0003 12.8 80.3 
Excl x's 9 96.1 .98010 .0001 6. 3 80,. 1 



(10) Yuman 



irrrcu.toltXt.XTv 




ft* t 



All points ( — -): Int «■ 2.040 Lex - 106.2 

Excluding x's (- - -) : 



N %EV 

Ml points 25 96.6 

'if;*- • 

Excl^S- 21 98.1 



Corr 



Int - 

\ 

i 


1.978 


Lex - 103. 


3 


Sig 


« 

SEE 


Lex- 100 ' 


Int-0 


.0001 


7.0 


97.9 


52*0 


.0001 


5.2 


94<4 


52.2 



1.4 Adjusting raw intelligibility sqores ■ 

- . i ■ * 

This section of t^e appendix sets forth eight fables which were used to select 
the method of adjusting intelligibility scores for each set of data. Each table has 
ten fjows and six columns, There is one row for eaoK of the ten field studies. The 
first column is for statistiqs pertaining to the raw - intelligibility scores; the 
remaining five are for the five different methods of adjusting soores'which were 
tried. See Section 5,2,4 for $ description \of each adjusting method and the 
rationale behind each, as wel^ as for the general rationale behind the selection 
process which is about to fce 5 illustrated, 'In each table, the underlined values 
indicate the adjustment which was ultimately. selected for jeach data set. 

Table gives, the slope of the regression line for predicting the given 

measure of intelligibility from lexical similarity* Table 1,2 gives the- intercept 
of the intelligibility axis for these same lines, Thus fnom thepe,>two 4 tables, one 
can reconstruct the predicting formula for the given set of data and" the given type 
'of intelligibility adJustmVn^/ That is, predicted percentage- of intelligibility 
equals the slope times the percentage of lexical" similarityi \plus tfie Intercept;, 
These regression analyses are performed *only gn the datfc pdljpts^hicfc/aY^ dot 
suspected bf rypnsymm^ric social 'factors; only tb^e pplnts.* pl^ttecf* as circles £„irj- 
Appendix 1 ,3 are included , ^ v \- - V* ' * ■ 

Table 1 JJ* repeats* the percentage of . explained variation /for rf^ch of the 
regression lirjes. In Table 1?4 the percentage «of explained variation for/ Che raw 
intelligibility model "is subtracted from tfce p^rcenta^e for each of> the mptftls^with; 
adjusted intelligibility. Th?' resulting figtfres shoiJ ttle . ppt - Improvement in the 
ability' of ^the . Hn'ear model to explain in^ligtb'ility ^aPfter' the intelligibility 
s&ores are adjustedV A negative, value , of course, Indicates that- the particular 



adjustment a^tuallA lessens ^the percentage of .explained variation, , One goal in 
selecting kn adjustment was to* find the valuer in ijietfow which was 
garve the highest improvement 0 i$ explained variation?* / * 



Table 1 ,5 "reports, the percentage of intelli^*lrf^tr^ which the- regression line* 
predict when similarity isjIOO*, Table 1,6 showed* ^oucb this 1 value deviates from 
the theoretically expected Value of IffOf. One gp^l in -aeiactlng an adjustment was 
to find * the value in the row (eltfter positive or ^ne^ativcf) whicrt vas newest zero, 
that is., was nearest the theoretical expectation 



"Table^l .7 reports, the value of similar ity. v^'lch pred lot s Ol intelligibility, "*in 
6 tft€?t%/on» , the intercepts on the similarity axis, pfi is, numb$*j O gives an idea of 
how wudfv the^^ed lotions from the. different model d^ergs^vJCWfc expeot fell lines to 
comfit* A^n ^oijOO* so any differences will ^ppearUt the low end of te the\ ^ine/K 
Ther n^fc ; vaiue , of the . sitffilarity * intercept * wa[s qoSputec} # for all «j3ix models 
cencernUfe' the; eigfct field studies- which give similar 'results (Biliau and Siouan ajr$ 
exc^bdedt; % ; Ttiis.yields a'm^an intercept vdlue, of 40.8k." £n a Table 1.4,8, „thds. toeai\ 
tf&lue is ^ubt^acted from B the., intercepts in Table 1,4*7/ -The r*sp^ti{ia figures 
indicate hpw -yearly the givjeA regression line 1^1 ike aljL' "others, . Qn^^gpal in 
selecting ,an VadJuiStment wai to. firtd M^e 'value In the ^rdti^( either* positive v or 
negative-) whi£frrwas , *ne*&ra2st iero/ that is, which" was nearest -the .overAll ..tr'erKl of 

an* studies { < ' % ■ . ♦* ' r * X - * % ' / r ~ « ; ■ P -* v " v » . \ - 

4 * # ' In noVpaSfe did *aii^tiree of «the a(tate* go^Ls poin€ tp ,tfte sj^e;adjui^ent * » A 
aU^3*otivV]^eihpd of judging the. raijative >$mport^rtce ^o1r "t^,th^e\gdala- waa'uaed \to 
^eiecVthe adjustments/ Objective" n^thodis of ranking adjaaflments and then afeledting 
twr r Adjus H tp^nt with J thfc highest' ^ a Verage^ Parik proved unsatisfactopy beoause Jbhey^ 
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oould not aooount for the qualitative differenoea between options.' For instance, 
for Buang the hometown adjustment gives the best improvement in explained variation 
(Table 1.4) and the model with the similarity interoept nearest the, average (Tab\e 
1.8), On the strength of^ the , highest rank on these two goals, the hometown 
adjustment^ turns out to have highest average rank, even though for the deviation 
from 100* (Table 1,6) it has the second Joweat rank. However,- a model wh ion 
deviates by 14.8% from the .theoretioal expectation of 100* intelligibility -for 
complete similarity is unacceptable. Therefore, the proportional adjustment for 
subjeots which ranks second on the other twd.goa-^s, but deviates from 100$ by only" 
.1.4$' was aeflwted. ■ > — - .**•"„ 
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Table 1.1 Slope 





•Raw 


Home- 
town 


Subject 
Prop Cons 


Test 
S Prop Cons 


Bil iau 


0.S2 


0 .79 


0>54 . 


0. 52 


0. 59 


0. 56 


Buang 


0.99 . 


1.57 ' 


1.43 


- 1.02 


1.39 


-0.94 


Ethiopia 


1. 21 


1 .36 


1.31 


1.19 


1. 32 


1.17 


Iroquois - 

t 


1 . 54 


2.16 


2.10 


1.56 


2.07 


l/55 


Mazatnec 


1. 96 

- \ 

1. 56 


2.20 . 


"2.00 


• 1.88 


2,08 




Polynesia 


1.63" 


1.63 


< 1.58 


. 1.64 


1.59 


Siouari^ , 


4.56 


4 .99 


4,95 


4. 59 


5.00 


4 . 76 ^ 


Tr Ique 


1.89 ■ 


1 .99 


1.^93 


1.90 


1.93 


* 1.90 


Uganda 


l.;46 


. 1.74 


1.79 


i 1.44 


1.81 


1. 46 


Yupian 


1.98 


2. 11 


2. 13* 


1.95 


2\ 15 


2.00 



Biliau 



T-able i. 2 Intelligibility intercept 



„Raw 



41. 5 



Home- Subject 
town Prb s p Cons 

19.4 45.4 



48.3 



Test 
Prop - Cons 



39.7 



Buang „ 


■ -30. 


7. 


-72. 


. 2 


-44 


.5 


-2. 


1 


-41 


.1 


Ethiopia 


-32. 

• 


4 


-39. 


.5 


-35 

* 


.r 


-23. 


1 


.-34' 


.7 


Iroquois 


'. '-81. 


3 


-118, 


.3 


> -110 


.2 


-55. 

* 


8 


-i06 


.7 


Maza^tec 


. -99. 


4 . 


,-199, 


.3 


-98 


.6 


-86. 


6 


-1Q6 


.4 


Pol ynesla 


-68. 


0 


-72. 


.0 


-71 


.2 


-64. 


6 


-71 


.6 


Sioadn j» 


-355. 


4 


;-393. 


.*4 


^386 


.5 


-351. 

*<> 


5. 


-390 


.8 


Trique 


. -90. 


5 


, -98. 


/4 


'-92 


•2 


C -89. 


3 • 


-92 




1 

Uganda.- 


-65. 


8 


-8 3. 


, 0 


* -80 


.4 


-44. 

V 


8 


-82 


. l 


Yuman * 


-103. 


3 


-411. 


.4 


-no 


.9 


-93. 


8 


-112 


.0 



42.8 
4.3 ;; 
-20. 3 
-52. 6 
-95.8 
.-65. 4 
9 

-&9 £i a 



^65. 
7346.' 



-47. 1 
-97-. 3 




Table 1.3 Percentage of explained- variation 

- " IB • » 



Raw 



Home-* 



F i 

Biliau 






town- 


Pr 


op 


Cona 


74. 


2 


77 


.'2 


67 


* 

.9 


67 


.5 


Buang 


65. 


8 


3 

71 


.2% 


65 




64 


.3 


Ethiopia . 


■75, 


1 


78 


*9 


75 


t 

.4 


- 71 


.5 


I roquois 


80. 


9 


92 


.0 


88 


.6 


80 


.4 


Maza tec 


» 


7 


77 


.6 


y 
. 67 


.9 


66 


.6 


Polynesia 


83. 


0 


83 


..3 


83 


.6 


V 84 


.3 


Siouan 


74*. 


2 


, 1* 


.7., 


74 


..4 ^ 


74 


,1 


Tr ique 


8'8. 


1 

7 


90 


.0 


'87 


.9 


87 


.7 


Uganda 


96.. 


1 


93 


.5 


96 


.4 


96 


,0 


Yuman 


98. 


1 


98 


.9 


9€ 


.9 


99 


.4 



Teat 



Prop 
78.0 
64.2 
75.1 
8 5.2/ 

71.1 

83.0 

74.5 

89. 0y 

96.2' 

93.4 



Cona 
78.0 
63*.l 
74.1" 
8*1 #3 

r 

70.8 
83.2 

rt 

is: i J 

88.8 
96.3 
97.8- 



Table 1.4 Improvement* over raw intelligibility 

i * 

Raw . 



Biliau 
Buang .;' 
Ethiopia 
Iroquois 
Maza tec 
Polynes ia 
Siouan " ,. 
Trique : 
Uganda 
Ydman 



o., 

0 . 
0* 
0. 
0. 
0. 

o-. 

0. 

■J. 

0. 
0. 




Subject , 
Prop Cona 



-6.7 
-1.4 
-3*<? 
-0.5 
-5.1 
1.4 

i 

-1.0 
-0 . 1 

1.3 

> * -■ 



Te a If 
"Prop Cons 



3. a 

0.0 

-0.6 
0.1 
0.3 
0.3 

.0.1. 
0.2 



. 3.7 

0.4 

-•0.9 
0.2 
1.1 

• o;. i 

0.2 
-0.3 



1 Q O 



<3» 



» 
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Tablt'l.5 Predicted* intelligibility for 100% similarity 



"el? * 



Buang 
Ethiopia 
Iroquois 
Mazatec 
Polynesia 
Sioupn v 
f rlque * r 
Ug anda° 
tyuman 



93.4 
68.0 
8$. 5 
72.7 

.3 
100.6 
98.9 
80,1 
9 4 '. 4 



Home- 
town 

98.1 

85.2 

96.0 

' 97.6 

100*.9 

91, Q 

105,8 

100 .6 

* 90 . 6 

99.3 



Subject 
Prop Cons 



Teat 



99.8 

96 .2 

99,4 
101.7 

92 . 3 
1Q8.6 
100.8 

98.8 
101.6" 



99.8 
99.3 
' 95.5 
,100,3 
101.3/ 
93.0 
107,7 
100 . 8 
99.2 
'101*6 



<3* 



Prop 
99.1 
98.2 
,97.1 

ioo.i 

102; 0' 
92.7 

109.2 

100 . 5 
99*3 

102.7 



Con 8' 

99.2 
.98.6 

97,1 
102.6 
1Q1. "9 

93.4 
109.0 
100.4 

99.4 
102.8 



tERIC 



Table. 1.6 Deviation from 100% Intel 



•'»■ 

Biliau > V 
, Buang • 

Ethiopia 
Iroqueis, 

t 

Maza^sc a! 
Polynesia' 
Slouan / V 
Trlque • 
Ugandan* 19. 9 
Yuman **. , 



R*w ■ . .Home- 
. town 



-Subject 
Prop 



:" -6 *6 
-j32ya 
*l;Ls 
^27.3 ;> 

,-3. 7-; 

-11.7 . 
0-.6 ; 

-l.,0 

-9.4 
. -5.6 ' 



>U9 

. ■ y i 7 ' "" ' 

-14*8 




■ '- ■ —i^ 
r i 



Table 1,-7. Percentage of similarity for 
predicted 01 intelligibility 

r Raw, Home* Subject - 



Test 





• 




town 


Prop 


coha 




Cofis 


Biliau 


-79 


.9- 

* 


— O A 

-24 


• 5 




A 


-93.7' 


-66. 


8 


-75.9 


Buang 

s 


31, 


. 1 . 








1- 


2. 3 • 


29 . 




>4.6 


Ethiopia 


. .26 


.8 




. 2 


26 . 


7 


19.5 


26. 


3 


17. 3 


Iroquois 


55. 


,8 


54 


.8 


M9. 


2 


. 35. r 


51. 


6 


33.9 


Ma za ter 




o 

> o - 


54 


.2 


; 49. 


2 


AC n 

46,0 


51. 


0 * 


48.5 


Polynesia 


43. 


5 - 


44 


•? 


43; 


6 


41.0 ' 


. 43'. 


6 , 


41.2 


Sibuan 


77. 


9 


78 


.8 


'78. 


r ■ 


,76.5' 


78 . 


2 


77.1 


cTr-iqua 


47, 


.8 " 


49 


.4 


.47. 


8 


47.0 


47. 


9 


47.0 


Uganda 


45. 


1 


. 47 


.8 


44,.* 




31'. 1 


45. 


3 


32.1 


Yuman^ 
























52. 


2 


• ' 52 


.9 


52. 


2 


48.0 


52. 


2 


, 48.6 



v, 4. 



Tab'le^l,8 Aviation from the mean value of^0.8% 



Biliau 
Buang 
Ethiopfa v 
-% s ^IroSquoif 

■ Ha ta tec 
f, , . Polynesia 
# Siouan ' ; 
Trique 
' Uganda 

a 

Yunian 



Raw 

-1,20.. 7 
-9.7 

-14.0 
12.0 

-10*0 
2.7 

,7.0 
4.3 
11.4 



Home- Subject 
town Prop . Con* 



5.1 
-11.6 
14.0 



-14 :i 
' U.8 

■ ! I t , 

8.-4 



5 3.,4- ^ 2 .8 



38.0 V 

8^6 
7.0 
12.1' 



37.-3 
7.0 
4.0 

11.4- 



t .'3 



-38.6 
-21. 3* 

• -5^0 

5.2 
0.2 

* 35V7 
6.2 

',-9.7 
7.2 



■ Test 
Prop Cons 



-65.+ -124.2-134.5 -107;6 
-9.7 



-liq 
-14.5 
10.8 
10.2 
2.8 
37. 3 
• 7.1 



-116.7 
-45.4 
-23. 5 
-6.9 
7. 7 
•0,4 
36,. 3 
6.3 



'4^5^' -8.7 



11*4 



7.8 
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1.5 Scattergrams for adjusted intelligibility 



(1) Biliau 



XNTKU.XQXSXLXTV 

J i I I 



I I I I, 




?5r 



4 \ 

LCXXCAt tXnXLAKXTV' 



All points ( 



■) : Ir\"t ■ . 506 Lex + 48.0 



Excluding x's (-*-.-) : Int « ..788 Lex + 19.4 



•\N ' %EV ; Corr Sig ' 9EE ■ Lex-100 Int-0 
All points* 9 40. € .^63679 .0652 6'.1- 98.6 ' -94.8 
Excl x's 6. 77. ^ .87855 .0212 4.5 ^98.1 .. -24. V 6 
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(2) B.uang 



XNTCLLIOtttLITV 

.,1 I I 




ucpic*L f lrilLA*ITV 



I- I 



t - 

All points ( )t Int ■ 1.148 Lex -,16.0 

Excluding xi* ) : m t - 1.431 Lex - 44.5 



'. ',. . N %EV Cor? \ Sig\ K '.SEE ■ Lex-100 Int-l 

* * '.\\ v ~* 

A14 points 21 49*3 .- .70216 ..0004" 16.7 98.9 : * 13.9 

Excl x?s 15 65.5 .80949 .0003 15.6 - 98.6 31«1 

. 4 . - * 



• 1. 
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(3) Ethiopia 



xrrncLLiotitLtTv 

.• i i 



A 



/ 




All paints (• 



) : Int * 1.354 Lex - 37.'} 



Exclud logic's (-.-, -) : Int - 1.356 Lex -♦39.6 



: a 



N. ' %EV Corr Sig- SEE,, Lex-100 Int-0 
MX points 30 .76.0 .87187 .0001 16.0 ■■9-8.1 27.6 
Excl x«s* .23 :7&.9- — ^ftfl.827 .0001 16.3^—96.0 29.2 



9 

ERIC 



: 18?, 



i . 



4 

xrrrKLLxaxtfLlTv 

1,1 • V 
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(4) Iroquois 



l l 

X 




. UtMXCfti MfllUNtlTY 



-All points (■ 



): Int - '2.06* Lex - .104.-3 



Excluding V's (- ): Int - 2.096 Lex - 1.10.2 

v '\> -., ■ ■'. 

N %EV Corr Sig x SEE ' Lex-100 u Int 

All points 14 77.1 .87796 .'O00l 21'.8 |02.6 50. 
Excl x's 12 88. .94131 .,0001 15.9- , 99.4 . 52. 



!83 



(5) Mazatec 



XHTCULI0I9IUITV 



U 




All points ( 



) : Int - 1.995 tek - 99.8 



Excluding x^« ( -) : Int - 2. 202 „L«x - ll9,3 



N %EV Corr ' Sig SEE L«x-!00 Int- 
All points 19 71.0 .64272 .0001 12.„9 ?9.6 50.1 
Excl x's 17 77.6 .88111 .0001 11.7 fOO.9 '« 54.2 




\ 'A 



-IS 9 , 
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(6) Polynesia 



XNTCLLIOtttUITV 




••ft 

LEXICAL '«iriILA*ITV 



All points (■ 



) : Jnt - 1.605 Lex - 64.0 



Excluding x' a (-' s - Int ■•1.376 Lex - 64.6 i 



N , %EV Corr Sig SEE Lex-100 In\J:-0 
75,.8 ^7084 .0001 14.1 96.5 



All, points 77 
Excl x's 67 84.3 .91840 .0001 11. 0 92.$ 



1 

39.9 , 



41.0 



185 



(7) Siouan 



XNTKtLIOttltlTV 



•tt 



— H— i » 

UMICAL •XMXLAKITV 




. ' i . 



Ail' points ( 



) : Iht - 4.792 Lex - 3/71.8 



Excluding x's (, ): Int • 4.992 L«x - 39 3^~" 



N %EV Corr - Si^ SEE *Lex,«»1iQ0, 



Int-b 



All points 2^ 70. 8* 484143 .0001 17. 2 /107.4 77.6 
Excl x*S 20: 79l7 .89298 .'Od01 14^.9 !• 105 J a ^ 78.'8 



/ 
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(8^) tfrlqujft 



r. , 



XNTCLLlOtttLITV 




H — H 

LCXXb^L tXfix.UMtXfv' 



Exil x's' 



N 


%EV 

i 




61.8' 


ii 


90 »0' 










.# V; 






• ■ . ■' * 



ft' ' 



All points (- jt Int » 1.495 Lex -U8V6" ... 

1 ' 1 ^ "> 

Excluding x'a (- - -) : Int - 1.990 tex -98.4 



Corr ?ig SEE Lex- 100, Int-0-. 
All points 15 61.8 .78600' .4)005 Jsl.2 3fr.5 

,9(4864' .0001. .6.5* 100.6' 49.4 " 
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(9) Uganda 



XNTCU.XOX9IIXTV 

lHtlT 1 1 • • « l 





•i 

LEXICAL 3IHILAHITV 



All points (■ 



Excluding x's (- 



) : Int - 1.632 Lex - 64.0 
- -) x Int - 1.792 Lex - 80 



Excl x's 



N 


%EV 


Corr 


Sig 


10 


. 82.7 


.90952 


.0003 


9 


96.4 


.98174 


.0001 



SEE 



7.4 



Lex-100 



98.8 



193 



188 



(10) Yuman 



iNTCLLIQtilLXTV 

' t l l 



50*- 



/WW 




* / 

t 

/ 



/ 



/ 



V 

/ . 



/ 



All pointy^ ':'; ' )t ' 'i|5t-> .2: 015 l4x. - 96.7 
Excluding fe '(-..-. Int" * 1.955 Lex - 93.8 



ng & t (-. .-. . ' 

r. ■ v »* 



> 



/ 



N . *%EV-' 



CoVr ' Sig /$EE Lex-100 



^All points '|5' ; 97.6 '.9S772' .0001/ .5.9 104.8 

/97.I5 .00(/l 2.8 101.6 



Excl x's ?1V 99,, 4 



Int-0 
4 8*0 
48.0 



t. . 
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CQMP^ETE DATA FOR THE' 



STUDY OF INELIGIBILITY ON SANTA CRUZ ISLAND 



2.1 Description of the data 



s 



The raw Hata from the Santa Cruz dialect survey are reported in full in Simons 
1977a. For tfte\urrent study those data have been preprooessed in a flew ways, 
primarily in orderxto give uniform dimensions and dialeot labels to all data tables. 
In the raw data, \ifferent sets of data have different numbeVs of rows and o'olumns 
or different row and\plumn labels.- s For the, present study, many of those, rows and 
qplumns are oombine<H.and manyVWe renamed to make all data tables uniform and 
oomparable. f • 



Thirteen dialects' are used ift both^dimensions of all tables. These are the 
thirteen points /at \whioh intelligibility was tested. The first section of this 
appendix gives tjie'mnemonio. oodes fbr the thirteen test points and a listing of the 
villages they /present;- In the, sections that follow all the data used in the 
analysis are described and listed. s 



2.1.1 The 




rfceen dialects 



\ 



.The th^tf teen dialects used throughout this study are listed below. Eaoh is 
viewed . as/ a V unique dialect made up of one or' more .villages. v When a number of 
villages ^j[re combined,' the villages are near neighbors and their speech 
are Identical or very nearly so. The term dialeot is used loosely here, 
no suggestion of how different the speech varieties are;- it only implies 



varieties 
It' makes 
that the 



speech / 
both. 



communities are in some way distinct, either spatially or linguistioally, or 



Figures 6.1. and '6.2 in Seotion 6.1.3 give sketoh maps of the island showing the 
location of the dialeots. It should be noted that the "Villages along the northeast 
shore of the island are omitted. This is beoause they are small and, are all recent 
Lgrations from more populous Villages whioh are inoluded in the H ,atudy.- Likewise, 
eastern tip of the island-, whioh* does not appear in the maps* is inhabited only 



he 



by reoent immigrants from another, island. T!he mnemonic oodes for^the 
the Villages they represent are as follows: 



dialect's (fcnd 



(1) 
(2) 

(3) 
■U> 
(5) 
(6) 

(7) 
(8) 



NEO 
WAT 
BAN 
NEP 
LWO 
VEN 
NEM 
BYQ 



(9) NOP i 
(1*0) NEA 



Neo 
Matu 

Mbanua, Noole, Lwepe,- Moneu, Monao, Nou, Uta 
Nepa, Palo, tybalo, Mateone, Nepu, Io, Napo • 
Lwowa, Malo 

Venga ' 
Nernb* ■ . 

Mbany$, Manoputi, Manamini 
Noepe, Mbapo, Monan 
i Nea, Nemboi 
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i 

i 



(11) N00 s Nooli, Nonia, Mbonembwe 
^ (12) MBI a Mbiraba - 

(13) NNO a Nanggu, Utongo 

2.K2 .Population . 

The population of the thirteen dialeots is as follows: 



NEO 


200 


0 . MAT 


120 


Ban 


450 


nep 


320 


' LWO 


370 


VEN 


290 


NEM 


180 


' BYO 


/1 80 


NOP 


1 AO 


NEA 


220 


N00 


.280 


MBI 


140 


Ma 




Total 


3090 



' . / ■ I 

2,1.3 Geographio distance ■/ * . 

The distance between dialects is measured between their main villages. The 
main village is the one listed first in the list Just given. Distance is measured 
in terms of the^umber of minutes required to travel between" the dialeots. These 
figures must be viewed aft approximations at best. In most cases, they are walking 
times. In the oase of Neo, Matu, Mbimba, and Nanggu, boating {either sailing or 
paddling or both) is invplved for oertain stretches. With the * decent advent of 
roads, vehioles, v and /(jutboaud motors within the past one or two decades, many of 
these distanoes have beph shortened. However, su^h means of transportation, are 
still not available to jbyeryone. v i 

Table '2.1 shows/:* the* minut!e^ of traveling time between thC^talects. Jhe 
figures on the diagonal, whJ.oh represent the distance from a diilect to itself, are 
an approximation tojtpe radius of the dialect. When the dialeot ^Qonsirfts' of only^a 
single village, the (jistanoe isj$iv<en as 5,minutei. If it consists of two or more* 
villages, £he average distance from the oentral village to the-others is given. 

. ^ ■ ■ I " ' ' x - 

tfhe last oolU^h of Table. 2.1 gives the average distance from a speaker of. Uie 
given dialeot to all|ot>her inhabitants of the island. A simple average * of * the 
distance to all dialects could have been computed by. summing .the figures in a row 
and dividing by thirteen. However, 'suoh a statistic does not take into aooount t 
differing populations of the .dialeots. Therefore the average distance from 
individual to all other individuals is oomputed. This is done by multiplying 
distanoe in a row fry the population of the tHaleot for the column. * Then the r 
summed and divided by the total population of the island, that is, 3090. The 

i\> the -average distance separating an individual from'that dialect from a 
individuals on the island. For instance, for BAN the average dist>anc 
minutes. One way to Interpret this figure is that ^ traveling ^no more *■ 
minutea from h^oe, 
of the island, i 
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a' BAN person oould odme Into contact with half of the 

l9n ■ 



Table 2,1 Geographic distance 
yalues are minutes of traveling time 



S 



•- 


NEO 


M&T^ 


BAN NEP 


LWO 


VEN 


NEM 


BV0 


NOP 


NEA 


NOO MBI 


NNG 


Mean . 


NEO 


5 


120 


155 


185 


95 


155 


260 


330 


435 


345 


415 


775 


805 


276 


MAT 


120 


5 


18fl 210 


150 


210 


315 


385 


490 370 


440 


800 


830 


313 


BAN 


155 

* 


180 


10 


30 


60 


70 


175 


245 


1 


1 on 


260 


620 


650 


180 


NEP 


185 


210 


30 


15 


90 


100 


205 

T 


27f 


265 


160 


230 


590 


620 

* 


185 


LWO 


95 


150. 


60 


•90 


15 


60 


165 


-235 


340 


250 


320 680 


710 


200 


VEN 
» 


155*210 


70 


100 


60 


5 

a 


105 


175 


280 


260 


330 


690 


720 


\J 


NEM 


260 


315 


175 


205 


165 


105 


5 


70 


175 


280 


415 


775 


805 


262 


BTO 


330 


385 


245 


275 


235 


175 


70 


■36 


105 


2'p 


345 

'V 


705 


735 


281 


NOP" 


435 490 


295 


265 


340 


280 


175 


105 


10 


105 


240 


600 


630 


299 


~ *NEA - 


345 


370 


190 


160 


250 


260 


280 


210 - 


'105 


10 


135 


495 


525^ 


238 


ifoo 


415 


440 


260 326 


320 


330 


415 


345 


240 


135 


15 


360 


390 


292^-* 


- MBI 


775 


800 


620 


590 


680* 


690 


775 


705 


600 


495 


360 


5 


30 


562 


NNG 


805 


830 


650 


620 


710 


720 


805 


735 


630 


525. 


390 


30 


5 


* 

588 



Island. , . ' ' * ' "> 

/ In Table 2.2, the distances In each row of table 2.1 are divided, by the average 
distance for that dialeot and then multiplied by 100 to oonvert it to a percentage . 
The result is a table of distances measured relative to the perspective of eaoh 
dialect (see Section 6,1.3). The rows are labeled "From:" while the columns are 
labeled "To:".- For insjbanoe, the distance from Nanggu (NNQ) to Mbanua (BAN) is only 
110* of the average distance from NNQ, while from BAN to NNQ it is 360$ of the 
average distance from BAN. This suggests that from an Insider's perspective, a NNG 
speaker, views BAN as, being nearer to his own village than a BAN speaker would view 
the distant to NNG. Note that the distanoes in Table 2.1 are symmetrio (that is,' 
the same in both directions), while those in Table 2.2 are not. 
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Table 2.2 Relative geographic distance 



.Values are percentage of mean distance from the origin point 



lb: 







NEO 


MAT 


BAN*NEP [j^O VEN NEM 


BY0 NOP 


neaAnoo 


MBI 


NNG 


• 


. NEO 


2 


43 


JU 


67 


34 


JU 




i on 


1DO 


1ZD 


150 


281 


292 


■ i 


MAT 




2 


CO 

DO 


67 


48 


•67 


1U1 


1 07 
±Z J 


ID 1 


1 1 Q 


141 


256 


265 




BAN 


86 


100 


6 


17 


33 


39 


97 


136 


164 


105 


144 


344 


360 




NbP 


1UU 


113 


ID 


8 


49 


54 


111 


148 




ot) 


173 


318 


335 


* 

* 


LWO 


4a 


75 


30 


45 


8 


30 




118 


170 


125 


160 


340 


355 


« 


VEN 


77 


104 


35 


50 


30 


2 


52 


87 


139 


129 


164 


342 


357 


From: 


t 

NEM 


99 


120 


•67 


78 


63 


40 


2 


27 


67 


107 


158. 295 


307 




> 


117 


137 


87 


98 


84 


62 


25 


11 


37 


75 


123 


251 


261 




NOP 


146 


164- 


99 


89 114 


94 


59 


35 


3 


35 


- 80 


201 


211 




NEA 


•145 


156 


80 


67 


105 409 


118 


88 


44 


4 


57 208 221 




"N00 


142 


\51 


89 


79 100 1*13 


142 U8 


82 
107 


46 


5 


123 


134 




MB I 


138 


142 


110 


105 


121 


123 


138 


125 


88 


64 


1 


5 




NNG 


137 


141 110. 


105' 


121 


122 


137 


125 


107 


89 


66 


5 


1 



2.1.4 Density of population * 

n Appendix 2.1.2, population wad measured In absolute terms. It oan also be 
measured relatively with respeot to the whole dialect system (Section 6.1 .3) by 
computing density of population. When population is, viewed in terms of its density H 
rather than }n absolute nupbers, one is hypothesizing that the attraction of a 
dialect could be enhanoed by the nearness of its neighbors; likewise, motivation for 
its speakers to travel widely to engage, in contact might be diminished. 

• \ " . • 

Here density Is oomputed roughly In terms of people per square mile. Aotually, 
no miles are measured. Rather, the traveling distances In minutes are divided by 
twenty to give a rough approximation to miles. The density at a diaieot is oomputed 
as the density in the square mile irt which the diaieot is located. The contribution 

Qf the dialect, itself is arbitrarily set at its population (evert when it may oover 
more tharv a square mile). The contribution of the othe.r dialects is oomputed as 
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follows. The population of another dialect is viewed aa evenly distributed over a 
circular area which has a. radius equal to the distance between the two dialects. 
This distance is squared and then multiplied by^ei (3.1416) to compute the density 
at the first dialect. For each dialect the contributions of the other twelve are 
computed and added to the population of the original community. The .result is a 
measure of population density at each dia.leot. The results are as follows: 



NEO 


•212 


MAT 


128 


BAN 


520 


NEP 


397 


LWO 


407 


VEN 


324 


NEM 


.194 


BYO 


191 


NOP 


148 


NEA 


229 


N00 


285- 


MB I 


1$9 


NNG 


221 



9 
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2.1.5 Lexical similarity 

- Lexical similarity between the dialects was measured as a gauge of their 
linguistic similarity. The computation of. cognate percentages is based on the 
Swadesh 100-word list. The lists were collected, by Richard Buchan and .are 
reproduced in full in Simons 1977a. The percentage of lexical cognates between all 
the speech communities is given in Table 2.3- Items were Judged cognate simply on 
the basis of phonetic similarity. No attempt was made to distinguish between direct 
inheritance and indirect inheritance through borrowing. * 

2^1.6 Lexical distance 

ha linguistic distance between dialects is approximated by, computing lexical 
distance. Lexical distance is the percentages of basic vocabulary that is «ot 
cognate. This is computed by subtracting the cognate percentages in Table 2.3 from 
100%. * 

The • lexical distance between dialects is given in Table 2.4. In the last 
column of the table, the average lexical distance separating an individual of each 
dialect from all other individuals on the island is given. This average distance is 
computed just as described for geographic distance in Appendix 2^.1,3. in Table 2..5 
the lexical distance figures in each row are divided by the average distance for the 
row to derive a relative, nonsymmetrio* .measure of linguistio distance. The 
interpretation of' these figures is analogous to the interpretation discussed in 
Appendix 2.1.3 for relative geographic distance. 

2.1.7 Measured intelligibility ' 

Intelligibility between dialects was measured using the technique described in 
Section 2.1. The responses were soored on the four point scale described in Section^ 
2.1.4. The results of the intelligibility testing are displayed in Table 2.6. The 
responses given are w£at I Judged to be the norms for the dialects taking the test. 
When an individual having olost} contact With the dialect on the test tape dominated 
the beginning of a test, I directed 'questions* to other members of the group in order 

■ ' -19 9 . .v 

• ' • \ 
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Table 2.3 Lexical similarity 
Values are percentage of Cognates 
NEO MAT BAN NEP U^O VEN NEM BYO NOP NEA NOO MB I NNG 

» m 

NEO 100 87 85 83 87 86 78 70 68 65 59 59 50 
MAT 
BAN 
NEP 
LWO 



NEM 
BYO 
NOP 
NEA 
NOO 
MB I 
NNG 



87 100 95 86 97 95 85 75 72 68 63 63 53 

85 95 100 87 96 93 85 77 74 72 65 65 54 
83 86 87 100 89 87 83 75 78 74 66 66 54. 
87 97 96 69 100 98 87 77 74 72 65 65 54 

86 95 '93 87 98 100 86 76 72 ' 70 63 63 53 
78 .85 85 83- 87 - 86 100 ,8;4 78 .75 70 70 59 
70 75 , 77 75 77 76 84 l\(0 88 80 73 73" 63 
68 72 74 78 74 72 78. 88 100 88 78 78 64 
65 68 72 74 72 70 75 80 88 100 85 85 68 
59 63 65 66 65 63 70 73 78 85 100 100 72 
59 63 65 66 65 "63 70 {3 78 85 100 100 72 
50 53 54 54 54 53 59 63 64 68 72 72 100 
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Table 2.4 Lexical distance 

Values are percentage of non-cognates • » 

NEO MAT BAN NEP LWQ VEN NEM BYQ NOP NEA N00 MB I NNG Mean 

NEO 0 13. \ 15 17 13 14 22 30 32 35 41 41 50 23.1 

MAT 13 ' 0 5 14 3 5 15 25 2.8 32 jP7 37 47 17^.8 

BAN 15 5 6 13 4. 7 15 23 ( 26 28 35 35 46 16.8 

NEP 17 14 13 0 11 13 17 25 22* 26 34 34 46 19.0 

* ' * 

LWO , 13 3 4 11' 0. 2 13 23 26 28 35 35 46 15.9 

VEN 14! 5 7 13 2 0 14 24 "28 30 37 -37 . 47 17.4 

NEM 22 15 15~ 17 .13 14 0 16 22 25 30 30 41 19.3 

BYD 30 25 23 25- 23 24 16 0 12 , 20 27 27 37 22.8 

■NOP 32 28 26 22 26 28 22 12 0 12 22 22 36 23.1 

NEA 35 32 *28 26 28 30 25 20 12 0 15 15 32 23.7 

rr 

NOO 41 37 35 34 35 37* 30 27 22 15 0 0 * 28 27.6 

MBI 41 37 35 34 • 35 '37 30 27 22 15 0 0 28 27.6 

NNG 50 47 46 46 46 47 41 37 36 32 28 28 0 38.7 



201 



196 

4 



V 



Table 2.5 Relative lexical distance 



Values are percentage of mean distance from origin point 














To: 














* 

NEO MAT BAN NEP LWO VEN NEM BYO NOP NEA NOO MB I NNG 


NEO 


0 


56 


65 


74 


56 


61 


95 130 139 152 177 177, 216 


MAT 


-73 


P 


28' 


79 


17 


28 


84 


140 157 180 
t 


208 


208 


264 


BAN 


89 


30 


0 


78. 


24 


42 


89 


137 155 167 


209 


209 


274 


NEP 


90 


74 


69 


0 


58 


69 


90 


132 116 137 


179 


179 


243 


- LWD 


82 


19 


25 


69 


0 


13 


82 


145 164 176 


221 


221 


290 


VEN 

/ 


80* 


29 


40 


75 


11 


0 


80 


138 161 172 


213 


•213 


270 


From: ' NEM 


114 


78 


78 


. 88 


67. 


73 


0 


83 114 130 


156 


156 


213 


BYO 


. 131 


110 


101 


110 


101 


105 


70 


0^ 53^ 88 


118 


118 


162 


NOP 


138 


121 


112 


95 


112 


121 


95 


52 0 52 


95 


95 


15^ 


NEA 


148. 135, 118 110 


118 


126 


105 


84' 51 0 

ft s 


63 


63 


135 


NOO 


149 


134 


127 123. 


127 


134 


109 


9a, 80- 54 


- 0 


0 


102 


MB I 


149 


134 


127 


123 


127 


134 


10.9 


98 80 54 


0 


0 


102? 


" NNG 


129 


121 


119 


119 


119 121 


106 


• 96 93 83 


72 


72 


0 ^ 



to assess how wen the majority was understanding. This latter »assessmfcnt 16 
reported In the table of sofcres; The periods indioate that intelligibility 'was not 
tested for that particular pairing of dialects. x The dialects listed along the lift 

-hand side of the table are those whioh listened to the test tapes. Those listed 
✓along the top are those which were the speakers on the test tapes. Thus, the "2" in 

the - top row of the table means that tfte people from Neo scored partial 

intelligibility when they listened to the dialect of Ne^u 

. * * 

2.1.8 Opinions about intelligibility . , 

> . • >■ ' ■ ' H\ ' t ' • ;„ 

Before . intelligibility tests were given,, the members of the group were asked 
how well they understood the other dialects oh Santa. Cruz. The question asked 
was: *How 'much of the speeoh of village X do you understand?" The answers wene 
soored on a three point scale: 2'a.all of it, 1 ■ some of it, 0 * noneof it. The 

4 X ' >— - ' ■ ' ■■' 
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+ Table 2.6 Measured intelligibility 



3 - Full intelligibility 
2 - Partial intelligibility 
1 ■ Sporadic recognition 



Dialect of speaker: 



of 



S 





NEO MAT BAN NEP LWO VEN NEM BYO NOP NEA NOO MBI NNG 


NEO 




•3 

• J 4 






• Z 


• • • 


MAT 


3 


"1 

J • a 




{ 2 


" \ 

) m 




„ BAN 






• • 




Z y Z 


2.1 


NEP 


3. " . 


3 3 


3 


3 


3 3 


2 . 1 


LWO 


3 


• • •> 




3 


3 2 


2 


VEN 


3- 


• • i 


3 3 


.3 


3 2 


2 . 1 


NEM 


• • 


• • i 


. . -3 


• 


3 


2 . 1 




3 


• • « 


• • 


,3 


. 3 " 


3" .1 


NOP 


3 . 


3 


• • 


3 


3 3 


3 .1 


NEA 


2 ' . 


> * 

• .3 . 




• 


V 3 


3 . '«>• 1 


NOO 


2 - 


3 ' . 


• • 


3 


3 


A 

3 . 3 


MB I 


2 


2 


•- e 


3 


. 3 


3 . 3 


NNG 


3 . 


3 . 


• • 


3 ' 


• • 


3 . 3 



... results of thla investigation are given in Table 2.7. .The dialects iisAed along the 
left hand side of the table are those to which the question was asked, \hose. listed 

• along the top are the ones which were asked about. Thus the soore oR "0" in the. 

( top row of the table Indicates that the people of Neo said they oould not\mderstand 
any of * he speech of, Nanggu. 1 

' The bottom row and the rightmost oolumn of the table give the attraotion and 
motivation ,of the dialects as indicated by these opinions (see Section 6.1.2.3). 
These are weighted by population 1ft the same manner as the„ average geographic and 
lexioal distfcnoe. That is. they are oomputed as an average intelligibility per 
individual, rather tii&an per dialect. This is done by multiplying the opinion scores 
Oy the population of^he intersecting dialeotT — summing, and dividing by total 



Table 2.7 Opinions about intelligibility 
•■ * . \ ' . • 

'How imjbh of the speech of village X do you understand? H 

2 » undefstand all of it , • 

1 » understand some of it 
0 - understand none of it 

Dialect asked about; 
. NEO MAT BAN NER LWO VEN MEM BXO NOP JNEA NOO MBt NNG Mot 



aske#f * 

/ • 



NEO 


2 


2 


2. 


2 


2 

*• 


2 


' 2 


2 


2 


2 


2 


2 


0 


-93 " 
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MAT 


2 


2 


2 


2 


2 


2 


1 


a 


0 


0 


0 


0 


0 




BAN 


2 


2 


2 


2 


2 


' 2 


2 


i 


1 


1 


1 


1 


. o. 


) .74 

f. 


NFP 


o 




o 


2 


2 


z 






/ 


2 


2 


Z 


/ 

1 


\ 

• yo 


LWO 


2' 


2 


2 


-2 


2 


2 


2 


2 


2 


2 




2 


0 


• .93 


VEN 


2 


2 


Z 


2 


2 


2 


2 ■ 


2 


2 


2 


2 


2 


1 


.96 
































NEM 


2 


2 


2 


2 


■2 

t 


2 


2 


2 


2 


2 


2 


2 


1 


.97 


' BYO 


2 
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2 


2 


2' 


2 


2 


2 


2 


2 


2 


2 


\ t 


.97 


iNOP 


2 


2 


2 


. 2 


2 


2 


2 


2 


2 


2 


2 


2 


1 


.97 


NEA 


2 


2 


2 


2 


2 


2 


2 


2 


2 


2 


2 


2 • 


1 


































NOO 
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2 
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2 


- 2 
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.96 ^ 
































MB I 
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2 • 
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.97 ' 


NNG 
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2 " 


. 2 
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2 
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2 




Att 


.98 


1.0 


1,0 


1.0 1.0 


1.0 


.98 


.90 


.88 


.88 .88 


.88 


1 .33 
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population. It should be noted that these computations do not include the diagonal 
in tt\e martix; they refer only to the other twelve dialects. The scores are further 

divided by two in order to convert them to a range of zero to one £#d make them 
easier to interpret. The attraction figures (Att) can be interpreted as the 
proportion of, the island's population which o'laiih to understand the given dialeot,. 
Thus we see that 100$ of the islanders claim to understand Mbanua (BAN) while only'' 
33$ claim to understand Nanggu (NNG). [ ( The motivation figures (Mot) oan be 
interpreted as the proportion of the island's population which the given dialect 
claims to understand. Thu*, we see that NNO claims to understand 100$ of the 
islanders while BAN claims to understand only m%. 



2.1.9 . Contact through church festivals ~ 

With the exception^ of some small newly established' settlements and. Qracioaa 
Bay*, wherejfour churches serve the 14 villages, every village on^antaJ Cruz has a 
■ church. Each church takes its, name from a saint or a feast day within the church 
year (e.g. Resurrection, Trinity, Ascension). Once- a year, on the appointed day ( of 
its saint or feaflrV- each church holds a festival. The festival begins with a 
special communion service in the church. This is followed by feasting and- dancing 
which continues all night, the young people participate in sports competitions as 
well. These festivals are a high point of the 'social year for the villages and they 
are in fact the only times' of feasting and dancing which are regularl>Jcheduled- on 
the calendar. 

.Anyone has an open invitation to attend a festivaland people always come from 
many of the surrounding villages. Thus the .frequency with which the people of one 
village attend the festival of other villages gives a rough measure of the amount of 
.contact and interaction between the villages. 

To determine the patterns of church festival attendance the following question 
was asked of the group of. people assembled for an intelligibility test: "How often 
do people from your village attend the church festival at village X?" The responses 
to the question were not always reliable. In some cases the person answered .that 
they went to all the festivals every time, but meant that they could go .to any of 
them at any time if they wished. In some cases an individual would answer only for 
himself, instead of the village, telling how often he personally goes to the 
festivals. In the first case the answers were consistently too high; in the second, 
they were consistently too low. In spite of attempts to rephrase the question, the 
proper kinds of response were not obtained in NEO.jLWO, BAN, and NOP. Thus missing 
values (signified by periods) are reported for these four villages.' The responses 
from Nanggu (NNQ) look suspicious on first glance as they claimed that they attended 
all of the festivals at least some of the time. This claim is, however, consistent 
with their results on the intelligibility tests, th'eir pattern of marriage ties, and 
their own opinions as to how well they. understood the other dialects. 

The results of the church festival question are set out in Table 2.8. The 
results are not strictly dialect to dialect contact; they are from central village 
of a dialect to aentr*al village. The list of villages on the left hand side of the 

table are the villages which were asked t-he question. The villages listed along the 

top are the villages where festivals are held. Thus\ the firsV2" in the second 
row of the table indicates that the people of Matu (MAT) attend the festival at Neo 
(NECO every year. • ' » 

* C 

Attraction and motivation are computed as they were for the opinions in the 
previous section. The attraction figures can be interpreted as . the proportion of 
the, island's population whioh attend the festivals at that location. It must be 
remembered, however, that attendance records for four of the dialects are not 
Included in the sum of the attending population/ but are still included in the total 
population figure. Thus the proportions are lower "than they would be if comparable 
data from the four were added. The motivation figures can be interpreted" as the 
proportion of the island's population which thdt group contacts in its festival 
attendance. These two sets of proportions should be qualified by stating that they 
do not apply to all individuals f within v the communities, but only to the delegations 
which represent them at festivals. 
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Table 2.8 Attendance at church festivals 

"• ^_ ■' 

|t "How often do people from your village attend 
4 the church festival at village X?" 

^ 2 « every year c 
1 * only some years 
0 never (ot very seldom) 

Village wher.e festival is held: 

' NEO MAT BAN NEP LWO VEN NEM BVO NOP NEA NOO MBI NNG Mot 

> \ 

NEO 

• * * 

MAT 22 2 2220,000000 . 55 

BAN \ 

NEP 1 2 2 2 2 2 1 1 1 2 1 0 0 .70 



LWO 



Village 
asked: 
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2.1,10 Contaot through marriage ties 

The present day network of carriage ties on Santa Cruz is set out in Table 2.9. 
At eaoh of the thirteen intelligibility- test points the people were asked "how" many 
people (either male "or female) from their immediate dialect group were married to a 
persoh from each of the other dialects on the islancU^ The answers to this question 
should produce - reciprooal responses". That is, th> people of MAT should answer the 
same number of marriages with BAN, as the people of BA^ answer for marriages With 
MAT. Any discrepancies in the original data betweeft the number of marriages as 
reported by different villages were rectified by assuming that the higher number was 
the correct number. This was dohe on the assumption that it was more likely that 
people would fail to think of a marriage with a particular dialeot than that they 
would report one that was not really true. 

The question, "How many people from your dialect are married to people from 
dialeot X?" was scored as follows: 0 * none; 1 * one; 2 » some (two to four); 
• 3 » many (five or more). When asking the question, the actual number of people was - 
requested for the response. Sometimes, whe,n many marriages were involved, the 
people were not able to think of every one and give an absolute' number. This, 
combined with the fact of the discrepancies for which figures were adjusted and the 
different size" of populations represented by the different dialects makes a scale of 
"none, one, some, many" preferable to the absolute numbers. The scale values which 
appear in Table 2.9 were assigned on the basis of the adjusted actual number of 
marriages reported. 

t 

Since the data in Table 2.9 are symmetric, measures of attraction and 
motivation 'cannot be computed. A better alternative to asking how many .marriage 
.ties link a pair of dialects, would have been to ask, "How many people from dialect 
X have married someone from here and are living here?" /This would yield a 
nonsymmetric table of results. Sincer this question was not asked, the next best 
thing is to use the available data to .predict what the. results might be. To do this 
the following hypothesis is made: the number of couples residing in a particular 
village is proportional to the size of the village. Thus, frf there are X number of. 
marriage ties between two dialects with populations A and B, the number of those X 
couples living in dialeot area A will be (X) (A/(A+B") ) , and the number of the couples 
l'iving in dialect Area B will be (X) (B/(A+B) ) . % 

In Table 2.10, the data in Table 2.9 are transformed as detailed above in- order 
to reflect predicted patterns' of marital residence. The dialects listed along the 
top are labeled place of residenoe, and those along the left hand side are labeled 
place of origin. The data are now nonsymmetric and measures of attraction and 
motivation can be computed. The row and column means are divided by three in order ✓ 
to' compute a proportion from zero to one. The attraction figure can be loosely . 
interpreted as the proportion of the island's population which has contaot with that 
dialeot because of marriage ties into that dialeot. The motivation figure can be 
loosely interpreted as the proportion of the island's population with which the 
dialect has contact because of marriage ties outside that dialect. 

t ,■ 

.2.1.11 Estimated intelligibility 

In the field it was possible to test only 78 "out of the possible 169 
intelligibility relations among the 13 dialects. On the basis of* the models 
developed in Section 6.3 to explain those 78 eases, the remaining untested relations 
can be estimated. Table 2.11 gives a complete matrix of estimated intelligibility. 
The estimates agree with the measurements in 9511 of the cases. The four oases where 
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Table 2.9 Marriage ties 

Number of marriage ties between dialects 

0 « no marriages 

1 * one marriage 

2 ■ some marriages (two to fqur) 
3, » many marriages (five or more) 
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the estimate differs from the measurement are underlined. 



Place 



of 



Origin: 



Table 2.10 Predicted marital residence 

P\ace of residence: I ■ 

NEO MAT BAN NEP LWO VEN NEM BYO NOP NEA NOO MB I NNG J*>t 

NEO 3.0 0.0 1.4 0.0 1.9 0.6 0.0 0*0 0.0 0.0 0.0 0.0 0.-5 .19 

MAT 0.0 3.0 2.4 0.7 0.0 1.4 0.0 -0.0 0.0 0.0 0.0 0.0 0.6 .19 

BAN 0.6 0.6 3.0 1.2 0.9 1.2 0.3 0.0.0.5 0.3 0.4 0:0 0.9 .22 

NEP 0.0 6.3 1.8 3.0 1.1 0.0 0.0 0.0 0*9 l-2'O..O 0.0 p.8- .21 

LWO 1.1 0.0 1.1 0.9 3.0 1.3 0.7 0.0 0.3 0.4 0.0 O.Q. 31.1 .22 

VEN 0.4 0.6 1.8 0.0 1.7°3.0 0,8 0.8 0.3 0.0 1.0 0.0- 0,4 .27 

NEM- 0.0 0.0 0.7 0.0 1.3 1.12. 3.0 1.5 0.0 0.6 0.6 0.4 .0.0 .21 

BYO 0.0 0.0 0.0 0.0 0.0 1.2 1.5 3.0 0.9 0.0 1.2 0,jp 0.0 .12 

"NOP 0.0 0.0 1.5 2.1 0.7 0.7 0.0 1.1 3.0 1.8 0.7. 0.0 0y6 .31 

NEA 0.0 0.0 0.7 1.8 0.6 0.'© 0.4 0.0 1.2 0.0 1.7 o'.O 0 t 5\ .22\ 

NOO 0.0 0.0 0.6 0.0 0.0 1.0 0.4 0.8 0.3 1.3 3.0 1.6 1.2., .18 

MBI 0.0 0.0 0.0 0.0 0.0 0.0 0.6 0.0 (J. 0 0.0 2.0, 3,0 1.2 .10 

'NNG 0.5 0.0 2.1 1.2 1.9 0.6 0.0 0.0 0.4 0.5 1.7 0'& 3.0* .36 

Att .10 .06 .40 .23 .31 .27 .13 .10 .15 .18 .22 .06.23 



The method used to estimate intelligibility was a two T out-of-three method for. 
combining the three* best predicting models. In most oases, the three models agree. 
In the oases where they do not, the level predioted by two of the models is taken as 
the estimated intelligibility. \ 

The first stfodel is based on oomposite relative distance alone (Seotion 6.4) ■ 
where composite relative distanoe equals six-tenths times relative geographic 
distance {Table 2,2) plus four-tenths times relative, lexioal distanoe (Table 2.5). 
The step funotion for predicting intelligibility is (see final scattergram ir\, f 
Appendix 2.2) , " „ 
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Int a 3, if composite distance k 134%? 



2, if 134% < 
» 1, if 185* < 

This model is 90% acoyrate. • 



composite distanoe ^ 185%; 
composite distanoe. - 



The second model is a oomplex model (Seqtion 6 .5 )/>iith s predicted .oontaot^. The 
contact factor is predictably the overall motiv0ti</i of tfhie listener's dialect as 
indicated by opinions about Intelligibility (Table/ - 20-7 ) divided by the relative 
geographic distance from the listener's dialaqt/tofffhe shaker's (Table 2.2). The 
scaling factors for these two variables are deso^betl' in appendix 2.3. After 'th.e 
two variablet.^re scaled, thay are multiplied^' to/compute the £ faotor whioh plugs 
into the forfflafiffcfor familiarity; 1 



F = L + COOO-L) 



The step function whioh predicts intelligibility 

Int = 3, if 89% < % Familiarity < 100%; 
= 2, if 82% ^ Familiarity J 89%; 
« = 1 , if Familiarity < 82%. 



u> than, 



This model is 92 % .accurate. N 



/ 



The third model is also a complex mode]f v*Lth predioted oqntact. Lexioal 
distance from the center is used to. estimate fattraotion and motivation. Contact is 
predicted by the attraction of the speaker (inverse of distance from center) times 
the motivation of the hearer (distanoe from" center) divided by relative geographic 
distance. The stealing factor* for these thrtfe variables are desoribed in. Appendix 

.2.3. Contact is plugged into the familiarity formula as above. The step function 
whioh predicts intelligibility is, 



»Int 



= 3, 
* 2, 
= 1. 



if 75% < Familiarity $ li 
if 64% < Familiarity < 7! 



if Familiarity $ 64% 



This model is 90% accurate. 
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Dialect 
of 

hearers: 



T*ble 2.11- Estimated' intelligibility 

f 

3 - Full intelligibility 
2 - Partial intelligibility 
1 ■ Sporadic- recognition 



^ialect of speaker: 
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I n't ■ 3, if composite distance^ 134%; 

. ■ 2, if 134% < composite distance £ 185%j 

**1, if 185% < composite distance^ 



This model is 90% accurate. 
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2.3 Scattergrams and step functions for^single variable 
yodels , , 



r 

. ( 

In the scattergrams, intelligibility is plotted on the 
vertical axis and the predicting variable is plotted on the 
horizontal axis. The plotted values are the letters of the 
alphabet. A indicates that one observation" is plotted a,t 
that point, B indicates that two are, and so on.* The steps 
of the step functions are indicated by. underscores . Below 
each scattergram three values are given: the sum of the 
deviations of predicted values of intelligibility from the 
measured values, the ratio of prediction accuracy, and- the 
percentage of prediction accuracy. 
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Absolute Geographic Distance 
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Opinions about Intel-lig ibiJLity 
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attendance* at Church .Festivals 
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Predicted Marriage Residence 
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Composite Relative Distance 
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2A3 Scaling of variables for inclusion in contact factor 

i **■ * - 

The contact factor in the predicting equation must take 
on tihe range of zero to one to prevent predicting more than 
100* iWell ig ibility . The variables are scaled by the 

X \ 

following formula: 

r , . i , - 

scaled value » (/aw value - min) / (max - min), 

Mln is the value for that variable which should scale to 
zero ,- max is the value which should scale to one. , Scaled 
values less, than zero are set' to zero, and those greater 
than one are set to one. Note that when the min value is 



zero, the formula reduces to a simple division: 
caw value/max. In the case of the measured contact models, 
the raw values are divided by the max values listed below 
and plugged straight into the prediction formula. In the' 
case of the predicted contact models, as many as three 
variables are involved: - attraction, motivation, and^ 
distance. in the case of opinipns, festivals, and 

\ 

marriages, the raw values for attractfon and motivation ar^ 
take from th% outer row and column of the data tables in 
Appendix 2.y In .the case)of the other four factors,' the 
raw Values are population, density, or distance from the 
center (MJianua) . In all cases,, the attraction « measure is 
associated w^th the speaker and thlr motivation measure is. 
associated with the hearer-. The third variable involved ^n 
predictions, is disfcanfce. The distance measures are scaled 



217 

so .as fcp invert them, that is, far distance yields a low 
value and close distance yields a high value. In this way 
the three scaled variables* can be multiplied to compute a 
contact factor in the scale of zero to one. The scaling of 
population and density was handled a little differently. 
The equations used are reported in the following table. 
Also, for these two variables, the product of attraction and 
motivation was further scaled as indicated in the table. 
With -all of this information it should v be possible to 
replicate the results- I report in the remaining sections of 
this appendix , 
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2.5 ' Scatterg,rams for complex models with measured contact 
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Festival attendance as measure of contact 
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Marriage ties as measure of contact 
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Marriage residence as measure of contact 
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2.5 N Results for complex models with predicted contact 

• * 

Jfr Following are seven tables , one for each of the 
variables used to predict contact. Fo,r each variable, 
eighteen different combinations of values in the numerator 
and denominator of the contact formula were used. The three 
rowS represent three di f f erent numerators : attraction 
alone, motivation alone, "and attraction times motivation. 
The six columns represent six different denominators: no 
distance (a constant value of one), absolute geographic 
distance, relative geographic d i stance , absolute lex ical 
distance, relative lexical distance, and composite relative 
d i stance . At the intersection of each row and column two- 
values are given. The first is the sum of the deviations* of 
predicted from measured values of intelligibility; the 
second, in parentheses, is the percentage of prediction 
accuracy. The total number of predictidns on which the 

4- 

percentages are based is given in the heading for- each 
table. 
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opinions about intelligibility 
(78 total predictions) 

None Geographic Lexical Composite 

Absolute Relative Absoluts Relative • Relative 

Attraction 13 (83%) 20 (74%) 12 (85%) 18 (77%) 9 (88%) 11 (86%) 

Motivation 18 (77%) 16 (79%) 6 (92%) 14 (82%) 8 (90%) 7 (91%) 

Attr & Mot 10 (87%) 16 (79%) 8 (90%) 14 (82%) 6 (92%) 6 (92%) 

Church festival attendance 
(57 total predictions) 

None Geographic Lexical Composite 

Absolute Relative Absolute Relative Relative 

Attraction 8 (86%) 12 (79%) 8 (86%) 11 (81%) 7- (88%) 9 (84%) 

m 

Motivation 20 (65%) 17 (70%) 12 (79%) 15 (74%) 12 (79%) 12 (79%) 
Attr & Mot 13 (77%) 12 (79%) 10 (82%) 10 (82%) 8 (86%) 8 (86%) 

Marriage residence 
(78 total predictions) 

None Geographic Lexical Composite 

Absolute Relative Absolute Relative Relative 

Attraction 23 (71%) 19 (76%) 16 (79%) 17 (78%) 16 (79%) 16 (79%) 

Motivation 25 (68%) 21 (73%) 15 (81%) 18 (77%? U (86%) 14 (82%) 

Attr & Mot 22 (72%) 19 (76%) 16 (79%) 17 (78%) 16 (79%) 16 (79%) 
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Attraction 
Motivation 
Attr & Mot 



None. 



Population 
(78 total predictions) 

Geographic 1 Lexical Composite 

Absolute Restive Absolute Relative Relative 



Z3 (71%) 17 (78%)J lf> (79%) 18 (77%) 15 (81%) 16 (79%) 
12 (72%) 16 (79%) 13 (83%) 16 (79%) 14 (82%) 14 (82%) 
19 (76%) 16 (79%) 15 (81%) 18 (77%) 16 (79%) 15 (81%) 



Attraction 
Motivation 
Attr & Mot 



Density of population 
478 total predictions) 

t 

None Geographic Lexical Composite 

Absolute Relative Absolute Relative Relative 

21 P3%) 18 (77%) 16 (79%) 16 (79%) , 14 (82%) *15 (81%) 

22 (72%) 15 (81%) 15 (81%) 15 (81%) 14 (82%) 15 (81%) 
18 (77%) 15 (81%) 13 (83%) 17 (78%) 12 (85%) 14 (82%) 



J 



None 



Geographic distance from center 
(78 total predictions) 

Geographic Lexical 
Absolute Relative Absolute Relative 



Attraction 16 (79%) 19 (76%) 13 (83%) 17 (78%) 10 (87%) 
Motivation, 14 (82%) 13 (83%) 10 (87%) . 14 (82%) 11* (86%) 
Attr & Mot 11^ (86%) 15 (81%) 11 (86%) 15 (81%> 10 (87%) 



Lexical distance from center 
^ (78 total predictions) 

None Geographic Lexical 

Absolute Relative Absolute Relative 



Composite 
Relative 

12 (85%) 

10 (87%) 

11 * (86%) 

i ' 



Attraction 
Motivation 
Attr & Mot 



14 (82%) 18 (77%) 12 (85%) 15 (81%) 11 (86%) 

15 (81%) 13 (83%) 10 (87%) 13 (83%) 8 (90%) 
U (86%) 14 (82%) 8 (90%) 13 (83%) 9 (88%) 



Composite 
Relative 

12 (85%) 

10 (87%) 

, 8 (90%) 
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